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The set of solutions of random constraint satisfaction problems (zero energy groundstates of 
' mean-field diluted spin glasses) undergoes several structural phase transitions as the amount of 

, constraints is increased. This set first breaks down into a large number of well separated clusters. 

' At the freezing transition, which is in general distinct from the clustering one, some variables (spins) 

^S) ' take the same value in all solutions of a given cluster. In this paper we study the critical behavior 

around the freezing transition, which appears in the unfrozen phase as the divergence of the sizes of 
the rearrangements induced in response to the modification of a variable. The formalism is developed 
on generic constraint satisfaction problems and applied in particular to the random satisfiability of 
boolean formulas and to the coloring of random graphs. The computation is first performed in 
random tree ensembles, for which we underline a connection with percolation models and with the 
reconstruction problem of information theory. The validity of these results for the original random 
ensembles is then discussed in the framework of the cavity method. 
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The theory of computational complexity [l| establishes a classification of constraint satisfaction problems (CSP) 
according to their difficulty in the worst case. For concreteness let us introduce the three problems we shall use as 
running examples in the paper: 

• /c-XORSAT. Find a vector x of boolean variables satisfying the linear equations Ax — b (mod 2), where each 
row of the 0/1 matrix A contains exactly k non-null elements, and 6 is a given boolean vector. 

• q-coloring (q-COL). Given a graph, assign one of q colors to each of its vertices, without giving the same color 
to the two extremities of an edge. 



fc-satisfiability (fc-SAT). Find a solution of a boolean formula made of the conjunction (logical AND) of clauses, 
each made of the disjunction (logical OR) of k literals (a variable or its logical negation). 



T—i ' Each of these problems admits several variants. In the decision version one has to assert the existence or not of a 
^ ' solution, for instance a proper coloring of a given graph. More elaborate questions are the estimation of the number of 
such solutions, or, in the absence of solution, the discovery of optimal configurations, for instance colorings minimizing 
^ the number of monochromatic edges. The decision variant of the three examples stated above fall into two distinct 
, complexity classes: /c-XORSAT is in the P class, while the two others are NP-completc for k,q > 3 (see Q for a 
classification of generic boolean CSPs). This means that the existence of a solution of the XORSAT problem can 
be decided in a time growing polynomially with the number of variables, for any instance of the problem; one can 
indeed use the Gaussian elimination algorithm. On the contrary no fast algorithm able of solving every coloring 
^ ' or satisfiability problem is known, and the existence of such a polynomial time algorithm is considered as highly 
improbable. 

This notion of computational complexity, being based on worst-case considerations, could overlook the possibility 
that "most" of the instances of an NP problem are in fact easy and that the difficult cases are very rare. Random 
^ ■ ensembles of problems have thus been introduced in order to give a quantitative content to this notion of typical 
instances; a property of a problem will be considered as typical if its probability (with respect to the random choice 
of the instance) goes to one in the limit of large problem sizes. Most random ensembles depend on an external 
parameter that can be varied continuously. In the coloring problem one can for instance consider the traditional 
Erdos-Renyi random graphs [3j] which are parameterized by their mean connectivity c. For (XOR)SAT instances this 
role is played by the ratio a of the number of constraints (clauses for SAT or rows in the matrix for XORSAT) to the 
number of variables. A remarkable threshold phenomenon, first observed numerically occurs when this parameter 
is varied: when a particular value Cg, as is crossed from below, the instances go from typically satisfiable to typically 
unsatisfiable. This statement has been rigorously proven for XORSAT [1, [1] and for 2-SAT in the other cases it 
is only a largely accepted conjecture, with sharpness condition on the width of the transition window Q and bounds 
on its possible location [l3| ■ 

Threshold phenomena are largely studied in statistical mechanics under the name of phase transitions. There is 
moreover a natural analogy between optimization problems and statistical mechanics; if one defines the energy as 
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the number of violated constraints, for instance the number of monochromatic edges, the optimal configurations of 
a problem coincide with the groundstates of the associated physical system, an antiferromagnetic Potts model in 
the coloring case. This analogy triggered a large amount of research, relying on methods of statistical mechanics of 
disordered systems originally devised for the study of mean-field spin-glasses [ll| . Early examples of this approach 
for the satisfiability and coloring problems can be found in [l^, . 

One of the most interesting outcomes of this line of research [ij, [IB] has been the suggestion that other structural 
threshold phenomenon take place before the satisfiability one^. The set of solutions of a random CSP, viewed as a 
subset of the whole configuration space, is smooth at low values of the constraint ratio but becomes fragmented into 
clusters of solutions for intermediate values of the control parameter, a G [a^, cts]- This clustering transition has been 
rigorously demonstrated in the XORSAT case [1, 0] , for which it has a simple geometric interpretation, is indeed 
the threshold for the percolation of the 2-core of the hypergraph underlying the CSP; between and as there is 
typically a finite fraction of the variables and constraints in a peculiar sub-formula known as the backbone. Every 
solution of the backbone gives birth to a cluster of the complete formula. The variables of the backbone are said to 
be frozen in a given cluster, i.e. they take the same value in all the solutions belonging to a cluster; this is merely a 
consequence of the definition of a cluster in this case. 

Establishing a precise and generic definition of the clusters is not an easy task, not to speak about proving tight 
rigorous results on their existence or pro per ties (for recent results in this direction see EES El)- Even at 
the heuristic level, it was recently argued |20l. [2ll . 12^ that the computation of for random satisfiability (or Cd for 
coloring) by previous statistical mechanics studies [23l.[24i] was incorrect. Roughly speaking, in these two models, the 
sizes of the clusters can have large fluctuations [2g| that must be taken into consideration. In [20!| the existence of 
yet another threshold (for k,q > A) ac € [ad,as] was also pointed out; this condensation threshold separates two 
clustered regimes, one where the relevant clusters are exponentially numerous (for smaller values of a) and the other 
where there is only a sub-exponential number of them. 

The clustering transition of XORSAT, because of its geometric interpretation, is certainly a good example on which 
developing one's intuition of the clustering phenomenon. There are however at least two aspects in which XORSAT 
departs from other CSP and where the intuitive picture must be taken with a grain of salt. The first is that the clusters 
of XORSAT all have the same size, because of the linear algebra structure of its set of solutions. For this reason 
the condensation phenomenon is not present in XORSAT. The second point is that clusters of XORSAT have frozen 
variables, by definition. There is however no obvious reason that this should be true for any CSP. On the contrary 
we shall argue in this paper that in general frozen variable s appea r at another value at of the control parameter, with 
generically at G [ad, as]- This was one of the results of [2l|, [22|, here we shall develop this point and quantify the 
precursors of the transition before af . For this we build upon the study of XORSAT presented in [2^ and extend it 
to generic CSPs, in particular satisfiability and coloring. The central notion studied here is the one of rearrangement 
(to some extent related to the long-range frustration of [l^l): given an initial solution of a CSP and a variable i that 
one would like to modify, a rearrangement is a path in configuration space that starts from the initial solution and 
leads to another solution where the value of the i'th variable is changed with respect to the initial one. The minimal 
length of such a path is a measure of how constrained was the variable i in the initial configuration. In intuitive terms 
this length diverges with the system size when the variable was frozen in the initial cluster. 

The paper is organized as follows. In Sec. |TT] we introduce a generic class of CSPs and precise the definition of 
the rearrangements. Sections IIIII and IIVI are devoted to modified (tree) random ensembles in which the approach is 
essentially rigorous; the former presents detailed computations in a rather generic setting and its application to the 
three selected examples, while the latter presents the numerical results and discuss the generic phenomenology at the 
approach of the freezing transition in the tree ensembles, with some more technical details deferred to App. [X] The 
computation is reconsidered in the perspective of the reconstruction problem in Sec. [V] The applicability of these 
results to the original ensembles is discussed in Sec. IVIl through a precise statement of the hypotheses of the cavity 
method. Conclusions and perspectives for future work are presented in Sec. IVIIl 

II. DEFINITIONS 

We introduce here some notations and definitions for a class of problems that encompasses the three examples we 
shall treat in more details. The degrees of freedom of the CSP will be N variables cr^ taking values in a discrete 



It was of course already known that the algorithms rigorously studied to derive lower bounds on the satisfiability threshold work only 
upto to values of a smaller than as 0|. These values are however largely algorithm-dependent and not directly related to a change of 
structure in the configuration space. 
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FIG. 1: An example of factor graph. The neighborhoods are for instance di — {a, b, c, d} and di\a — {b, c, d} 

alphabet X; global configurations are denoted a = (cti, . . . , cttv). An instance (or formula) F of the CSP is a set of 
M constraints between the variables ct^. The a'th constraint is defined by a function ipai^a) ~^ {Oj 1}i which depends 
on the configuration of a subset of the variables and is equal to 1 if the constraint is satisfied, otherwise. The 
set Sf C of solutions of F is composed of the configurations satisfying simultaneously all the constraints. It can 
thus be formally defined as Sp = {2\'^f{2) = l}i where the indicator function ^/^i? is 

M 

i'Fig^^XlM^) ■ (1) 

a=l 

When the formula admits a positive number of solutions, call it Zp, the uniform measure over the solutions is denoted 

mf(o:) = V'-f(ct)/%. 

Factor graphs |28| provide an useful representation of a CSP. These graphs (see Fig.[T]for an example) have two kind 
of nodes. Variable nodes (filled circles on the figure) are associated to the degrees of freedom ai , while constraint nodes 
(empty squares) represent the clauses ipa ■ An edge between constraint a and variable i is drawn whenever tpa depends 
on (Ti. The neighborhood da of a constraint node is the set of variable nodes that appear in a^. Conversely di is the 
set of constraints that depend on Ci. We shall conventionally use the indices . . . for the variable nodes, a,b,... 
for the constraints, and denote \ the subtraction from a set. Two variable nodes are called adjacent if they appear 
in a common constraint. The graph distance between two variable nodes i and j is the number of constraint nodes 
encountered on a shortest path linking i and j (formally infinite if the two variables are not in the same connected 
component of the graph). 

The three illustrative examples presented above admits a simple representation in this formalism: 

• /c-XORSAT. The degrees of freedom of this CSP are boolean variables that we shall represent, following the 
physics conventions, by Ising spins, X ~ { — 1,+1}. Each constraint involves a subset of k variables, — 
(ct^i , . . . , CTjfc), and reads '4'aio_a) = H'^il ■ ■ ■ Cifc = Ja), where here and in the following I(-) denotes the indicator 
function of an event and Ja G { — is a given constant. This is equivalent to the definition given in the 
introduction: defining Xi,ba G {0,1} such that ct^ = (—1)^' and Ja = (—1)^°, the constraint imposed by tpa 
reads x^i + • • ■ + x^k =^ ha (mod 2), which is nothing but the a'th row of the matrix equation Ax = b. The 
addition modulo 2 of Boolean variables can also be read as the binary exclusive OR operation, hence the name 
XORSAT used for this problem. 

• q-COL. Here X — {1, . . . ,q} is the set of allowed colors on the N vertices of a graph. Each edge a connecting 
the vertices i and j prevents them from being of the same color: ipai^i, <Jj) = I(cri ^ <Jj). 

• fc-SAT. As in the XORSAT problem one deals with Ising represented boolean variables, but in each clause the 
XOR operation between variables is replaced by an OR between literals (i.e. a variable or its negation). In 
other words a constraint a is unsatisfied only when all literals evaluate to false, or in Ising terms when all spins 
Gi involved in the constraint take their wrong value that we denote J*: ^a{o_a) = 1 ^ ]I(o'i = Vi G da). 

The random ensembles of CSPs instances we shall use are defined as follows: 

• /c-XORSAT. For each of the M clauses a a fc-uplet of distinct variable indices . . . is chosen uniformly at 
random among the (^) possible ones, and the constant Ja is taken to be ±1 with probability one-half. 

• q-coloring. A set of M among the (^) possible edges a ~ {i, j} is chosen uniformly at random. 
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• /c-SAT. The variables are chosen as in the XORSAT ensemble, and the are independently taken to be ±1 
with equal probability. 

For the coloring problem this construction is the classical Erdos-Renyi random graph G{N,M), the two other cases 
are its random hypergraph generalization. We are interested in the thermodynamic limit of large instances where N 
and M both diverge with a fixed ratio a = M/N ^. Random (hyper)graphs have many interesting properties in this 
limit For instance the degree of a variable node of the factor graph converges to a Poisson law of average ak for 
the XORSAT and SAT cases, and 2a for the coloring ensemble. For clarity in the latter case we shall use the notation 
c = 2a for the average connectivity. Moreover, picking at random one variable node i and isolating the subgraph 
induced by the variable nodes at a graph distance smaller than a given constant L yields, with a probability going to 
one in the thermodynamic limit, a (random) tree. This tree can be described by a Galton- Watson branching process: 
the root i belongs to I constraints, where I is a Poisson random variable of parameter ak (c in the coloring case). The 
variable nodes adjacent to i give themselves birth to new constraints, in numbers which are independently Poisson 
distributed with the same parameter. This reproduction process is iterated on L generations, until the variable nodes 
at graph distance L from the initial root i have been generated. 

We now define the main object of our study. First recall the well-known definition of the Hamming distance between 
two configurations, d{g_,T) = X^i^i M'^i ''"»)• Consider an initial solution of the formula, a G Sp, and imagine one 
wants to modify the value of the variable i. A rearranged solution is a new configuration t S Sp such that Ti ^ ai. 
The minimal size of a rearrangement (m.s.r.) for variable i starting from a S Sp is defined as 

ni{a, F) ^ Tam{d{a,T)\Te SF,Ti ^ at} , (2) 

r 

and measures how costly (in terms of Hamming distance) it is to perturb the solution at variable i ^. It can also 
be viewed as the minimal length of a path in configuration space, modifying one variable at a time, between g_ and 
another solution with a different value of variable i, thus providing a quantification of how much constrained was 
initially this variable. We shall also speak of the support of a rearrangement as the set of variables which differ in the 
initial and final configurations, the size of the rearrangement being the cardinality of its support. 

In general the m.s.r. will depend on the starting configuration, wc thus define its distribution with respect to an 
uniform choice of a (in abbreviation m.s.r.d.), 

Qn''^^ ^^f^Fis.) Sn,n,{g_,F) ■ (3) 
g_ 

There should be no possibility of confusion between the distribution (/„ and the number q of allowed colors in the 
g-COL problem. When dealing with random CSPs we shall study the average of this distribution, 

qn^E q(y^ , (4) 

where the expectation is taken with respect to the instance ensemble (in the cases considered here all variable nodes 
are equivalent on average). Its behavior in the thermodynamic limit will drastically change with the connectivity 
parameter a (or c for the coloring). We shall indeed define the threshold af (cf) as the value above which a finite 
fraction of the distribution is supported on sizes n that diverge with the number of variables. In pictorial terms 
clusters acquire frozen variables at this point, their rearrangements must be of diverging size and thus lead to a final 
solution outside the initial cluster. 

The computation of the average m.s.r.d. will be first undertaken in a random tree ensemble, mimicking the tree 
neighborhoods of the random graphs. The threshold for the freezing transition in these tree instances will be computed, 
along with a set of exponents characterizing the behavior of the average m.s.r.d. when the transition is approached 
from the unfrozen phase. For clarity we shall denote ap instead of Off the thresholds in the tree ensembles. We shall 
then argue in Sec. IVIl on the basis of the non-rigorous cavity method, that for some values of a and k the properties 
of the random graphs instances are correctly described by the computations in the tree ensemble. In particular for 
large enough values of k we shall conjecture that a-p — af. We will also explain how the computation has to be 
amended to handle the more elaborated version of the cavity method (with replica-symmetry breaking), and what 
are the expectedly universal characteristics of the critical behavior at the freezing transition. 



^ In this limit the quantities studied in this paper are not affected by some variations around these models. For instance in the coloring 
case G(N, M) can be replaced by the ensemble G{N,p) where each edge is present independently with probability p = 2a/N, such that 
the average number of edges is close to M. The choice of the (hyper)edges with or without replacement is also irrelevant. 

^ if ai takes the same value in every solution we formally define rt; = Af -|- 1 . 



5 




FIG. 2: The cavity graphs Fa-,i and i*i_,a obtained from the example of Fig. [T] 

III. MINIMAL SIZE REARRANGEMENTS IN RANDOM TREE ENSEMBLES 

In this and the next Section all the instances of CSP encountered have an underlying factor graph which is a finite 
tree. Given such a formula F (or equivalently its factor graph) and an edge i — a between a variable node i and an 
adjacent constraint node a, we define two sub formulas (cavity graphs) Fi^a and Fa^i- Fi^a is obtained from F by 
deleting the branch of the formula rooted at i starting with constraint a. Conversely -Fa— >i is obtained by keeping 
only this branch (see Fig. [21). We also decompose the configuration g_ as {g_a^ii(^ii^i^a)^ where g_a^i (resp. ct^^^) 
is the configurations of the variable nodes in Fa—,i (resp. Fi^a) distinct from i. The notation ct^^ will be used for 
the configuration of all variables except i. The computation, based on the natural recursive structure of trees, will 
be performed in three steps: we shall first see how to obtain ni(a;, F), then its distribution with respect to ct, qn'^\ 
which shall finally be averaged over a random tree ensemble. For notational simplicity F will often be kept implicit. 
This approach is presented in a general setting before the three specific cases of XORSAT, COL and SAT are treated. 



A. General case 

1. Given tree, given g_ 

The computation of the m.s.r. on a tree factor graph can be performed in a recursive way. One has to determine, 
for each value of ^ cr^, the cost, in terms of Hamming distance, of the modification Ui — > t^. This can be done 
by computing separately these costs in the factor graphs Fa^i for all the constraint nodes a around i and then 
patching together the rearrangements of the sub-formulae. Rearranging a factor graph Fa—^i amounts to looking for 
a configuration of the variables j £ da\i which satisfies the interaction a and which provokes a minimal propagation 
of the rearrangement in the branches Fj—,a- 

To formalize this reasoning we introduce a g-component vectorial notation, n, where the rows of the vectors are 
indexed by a spin value in X, and we shall denote [n]r the t^^ component of n. We define ni{a) as the m.s.r. for i 
starting from the initial configuration g_, and with the final value Ti encoded in the row of the vector: 

[ni{a)]r, = min{d(a,r = (ri,Tu))|r G Sf} ■ (5) 

The original quantity ni{a) is obtained from this more detailed one as ni{a) = min [ni{g_)]ri- The recursive compu- 

tation of is performed in terms of vectorial messages on the directed edges of the factor graph, fii^a and Ua-ti- 
The former, fii—.aio'iiS.i^a) defined exactly as with the cavity graph Fi^a replacing the original formula F. The 
latter reads 

[na~.r{g:^^i)]n min{d(CT„^,, J I (T„r^^ J G 5f„^J . (6) 

Note that here one does not count the cost of flipping the root variable, which avoids overcounting when gluing 
together the cavity graphs. A moment of thought reveals that these messages obey the following recursive equations: 

na^iiZa^i) = fi{nj^ai<Jj,^j^a)}jeda\z) , (^j , CT^^a) = 9<yd{nb^Li^b^i)}b&dt\a) , (7) 
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where the functions / and g are given by 

f{{nj^a}jeda\i) 

[ga{ni, ■ ■ • = 7^cr) + [ni]r + ' ' ' + [ni]r ■ (9) 

To hghten the notations we keep imphcit the dependence of the functions / and g on the edges of the factor graph. 
These equations can be easily solved, for a given initial satisfying assignment a, noting that the messages from the leaf 
variable nodes i satisfy the boundary condition ni_,a{(yi) = o{ai), where we define [0(17)],- = I((t ^ r). The recursions 
([7|) can then be successively applied to determine the value of all messages in a single sweep from the exterior of the 
graph towards its center. When this is done the m.s.r. for a variable i is obtained from 

fiti^) =ga,{{na^t{<J.,,gi^^,)}aedt) ■ (10) 

Note that this recursive approach provides not only the size of a minimal rearrangement, but also a final configuration 
achieving this bound. One just has to to bookkeep, along with the size informations encoded in the messages n, the 
configuration reaching the minimum in Eq. ([5]) (if there are several of them one is chosen arbitrarily). By construction 
the support of these optimal rearrangements is connected. 



mm ■ 



E 



(8) 



2. Given tree, distribution with respect to a_ 



Following the program sketched above, we introduce now a probability distribution fi for the initial solution g_ of 
the formula: 

where Z is a normalization constant, i? is a subset of the leaves of the factor graph, and the 77oxt are probability laws 
on X that, by analogy with magnetic systems, we shall call fields, vanishes for configurations which do not satisfy 
the formula; if _B = it is uniform on the set of solutions, otherwise the external fields 77ext can introduce a bias in the 
law (this possibility will reveal useful in the following). We assume that the expression above remains well defined in 
the presence of the external fields, i.e. that they do not put a vanishing weight on the solutions of the formula. 

The absence of cycles in the factor graph induces a Markovian property of the measure /i which greatly simplifies 
its characterization. One can indeed compute recursively the marginals of the law on any subset of variable nodes, 
introducing on each directed edge of the factor graph another family of messages (cavity measures) Va^i{(Ji) (resp. 
'/i^a(o'i))- These are the law of Oi in the measure associated to the cavity factor graph Fa^i (resp. Fi^a), and are 
solutions of 

l^a^i = f{{Vj^a}jeaa\i) /({??i^a}jG9a\i)(CT0 = "TT \ T ^'^ ('^^ ' V ) T\ Vj^aiOj), (12) 

Vi^a = 9{{t^b^i}beai\a) 9{{n-^i}beat\a){crt) = -77 \ r TT i^b^ticn) , (13) 

z{Wb^.}bed^\a) 

where the functions z are defined by normalization. Again for clarity we do not indicate explicitly the dependence of 
the functions /, g and z on the edges. The boundary conditions are 77i_>a = f?ext,i when i is a leaf in B, r]i—,a = rj (the 
uniform law on X) if i is a leaf not in B. This set of equations enjoys the same structure as the one on the n's (see 
Eq. (O), and can also be solved in a sweep from the leaves of the factor graph. The marginals of for any connected 
subset of variables can be easily expressed in terms of the solution of this set of equations. For instance the marginal 
of a single variable reads 

M(c^i) = 9{{>^a^i\aedi){(yi) , (14) 

while the variables of a constraint, conditioned to the value of one of them, are drawn according to 

fJ-{3La\iWi'AV3^a}jeda\i) = —f 7 ^ TM<^ti2La\i) Yl ^J^i(^j)' i^^) 



7 



where again z is a normalizing factor. 

We have now to compute the distribution of the minimal size rearrangements when the starting configuration g_ is 
drawn from /i. The generation of g_ can be performed in a recursive broadcasting way: one first draws an arbitrarily 
chosen root variable Ui according to its marginal /i(cri). Because the factor graph is a tree, the law of the remaining 
variables factorizes on the different branches around i, 

MtevjcTi) = n A^K^ik*) • (16) 

For each branch Fa^i one proceeds by drawing the variables of a^^j, conditioned on ai (see Eg. (flSl) ) . Then the value 
of (jj for each j ^ da\i conditions the generation of g_j^a i which can itself be broken in subtrees as in Eq. (|16p . This 
process is repeated outwards until the leaves of the tree are reached. 

This observation leads us to introduce the distribution of the n's messages with respect to the conditional distri- 
butions of the initial configuration, 

Combining the recursive computations of the messages n expressed in Eq. ([7]) and the recursive generation of the 
initial configuration g_ leads to 

!La\i j£da\inj^a 

^a,.^ = n E c:"'^ > (19) 

with the boundary condition given by q^^""'^^'^ — Sfi^s{(Ti) for the leaves i. The distribution of the m.s.r. for i when a 
is drawn from /i can then be obtained from the distributions on the edges neighboring i, 

(Tin a(£di n^^i 



3. Average over the choice of the tree 

At this point we define an ensemble of random rooted tree factor graphs on which we shall perform the average 
of the m.s.r. distribution. The ingredients of the definition are pi, a distribution on the positive integers, piip) & 
distribution on the 0/1 constraint functions (with possibly a random degree fc), and a distribution of fields Vir]). Let 
us denote T^, a random tree of the ensemble of depth L, and for notational simplicity the elements of this ensemble 
conditioned on their root being of degree one. T^, is defined by induction on L as a (Gallon- Watson like) branching 
process. Tq is made of a single variable node (the root) to which is applied an external field r/ drawn from V. is 
generated by introducing a root variable node i, connected to a single interaction node a whose constraint function 
Tpa is drawn from p. Then each variable node in da\i is taken to be the root of an independently generated . 
Conversely T^+i is made by identifying the roots of I (a random integer drawn from pi) independent copies of T^. 

For each tree drawn from this ensemble the two recursive computations yield a set of messages on each edge of the 
factor graph directed towards the root, (ry, {q^^^ for an edge from a variable to a constraint, (i/, {q^^^ from a 
constraint to a variable. The randomness in the definition of the tree turn these objects into random variables, whose 
distribution depends only on the distance between the considered edge and the leaves. To be more precise, let us call 
Vl(V} {lit^}) the distribution of (pL^ai), {qt'^}) when i is the root of a random tree, and similarly ^l(i^, {q^^}) 
for the distribution of the messages directed to the root variable node of T^. 

One can first notice that the recursion between the messages r],!^ do not involve the size distributions qn and qfi, 
and thus define VLiv) as the marginal of disregarding the qns, and similarly Vl{j^) from Vl and Vl obey 
functional equations of the form Vl = F[Pl\, Vl+i = G[Pl], with Vl=o = "P, and where the functionals F and G 
have a compact distributional writing. 



(21) 
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The first equation means that drawing a variable v from Vl amounts to drawing a constraint function -0 from p, fc — 1 
i.i.d. variables r\i from and computing v from Eq. ((T^). Similarly Vl+i is obtained from Vl thanks to Eq. (fT^ . 
with the branching number I drawn from . In the following we shall assume that the distribution V on the boundary 
of the tree is a solution of the fixed point functional equation V = G[F\P]]. This implies a stationarity property with 
respect to the number of generation L, Vl = V, Vl = V = F[V]. This justifies a posteriori the choice we made 
of including non-trivial biases at the boundary in the law (jlip : in generic models unbiased boundary conditions 
represented by V{r]) — 5{rj — rj) do not satisfy this stationary property, this will be in particular the case for the 
random fc-SAT problem studied below. 

The evolution of the size distributions when iterating the tree construction is coupled, through the term fJ-{gii^\i\o'i) 
of Eq. (HH]), to the 77,^ messages. We are however interested in a rather simple quantity, the average of the m.s.r. 
distribution of the root (see Eq. ([20]) ) with respect to the random tree. It is thus possible to compute an average 
of the 9^*^"''^'^ on an edge of depth L, provided this average is conditioned on the value of the associated message 

Tji^a- This conditional average, denoted q^'^\'r])i and its counterpart <f^'^\v), are then found to obey the following 
equations, 

j dV{rii)...dV{-qk-l) 5{v - f{-ni,.-.,rik-i,^)) ^ /i(cri, . . . , crfc_i |cr, 771, . . . , 77fc_i, ■0) 

(Ti ,...,(Tfc_i 

«l,...,nfc-i 

J2piJdV{,.,)...dVM6{v~g{,y„...,iyi)) J2 ■ ■ ^nraAnu...A) ^ (23) 

I ni,...,ni 

with the boundary condition qli'^~^\i]) — ^n,o{a)- Finally the sought-for average m.s.r.d. for the root of a random 
tree of depth L reads: 

qi"-^ = / dV{v) E '?('^) E In'^^^i^) '^","-["1. ■ (24) 

(7 n 

The numerical resolution of Eqs. (|22|23p could at first sight seem rather difficult, as they involve, for each value of 
the random variable rj (or v), q distributions of vectors ft. One can however devise a simple method, generalizing the 
population dynamics algorithm of ^2^. The important point is to notice that for a given value of cr, q^'^\i])'Pi'n) 
can be viewed as a joint distribution of variables {rj, n^'^^), which can be numerically represented by a population of a 
large number J\f of couples {{'r]i,n['^'^)}fLi. The empirical distribution of these couples is taken as an approximation 
(known as a particle approximation in the statistics literature) of q^'^\7j)V{ri). This suggests the following algorithm. 
Initialize a population {r]i}^i drawn i.i.d. from V (this shall be itself performed by a standard population dynamics 
approach), and associate to each of them q vectors, = 0(17). We thus have, for trees of depth L = 0, a population 
{{rji,nf^\ . . . , nl^'')}fLi- To take this population from depth L to depth L + 1 one has to 

- generate in an i.i.d. way TV elements {vj , n^^'' , . . . , nj'''' ) , with j G [TV + 1 , 2TV] to avoid notational confusion, by: 

• choosing randomly a constraint function ip from p, and k — 1 indices ii, . . . ,ik-i uniformly at random in 
[1,AA]. 

• computing ly^ = f{r],^ t],^_^ , tp). 

• for each a G [1, 9], 

* generating a configuration (cti , . . . , 0-^-1) according to the law /x(- |cr, rji^, . . . , rii^_^ , tp). 

* computing n^-'"^ = f{nf^^\ . . • , n|j^*^^\ V)- 

- then generate a new population {{rji^n'p , . . . , nf^)}^^, repeating for each i G [1,TV] independently the following 
steps : 

• Choose randomly a degree I from pi and I indices ji, . . . , ji uniformly at random in [TV + 1, 2TV]. 

• Compute ?7i = g{vj^ , . . . , z/^-J. 

• For each ct G [1, q], compute n^f '' = ga-{n^'^\ . . . , n^j"^)- 
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After L iterations of these two steps, for a given value of a, an element {rji, nf'^) with i uniformly chosen in [1, A/] is 
distributed with the joint law ql^'^\'ri)V{r]) ^ . We can thus complete the computation of gi^^ in terms of a weighted 
histogram, 

«'^^=17EE'?^(^)'^«.min[.<^']. ■ (25) 
■'^ i=\ a=l 

We shall now examine how this general formalism can be applied to the three exemplar problems of XORSAT, 
COL and SAT. 

B. fc-XORSAT 

1. On a given tree factor graph 

Let us recall the factor graph representation of a fc-XORSAT formula we use: the variables are Ising spins Ci = ±1, 
and each constraint node a is satisfied if and only if the product of its k neighboring variables Jlieaa equal to a 
given constant Ja = ±1- The computation of the m.s.r., already performed in (26l |. is much simpler than the general 
case presented above. Note first that for any CSP where variable can only take two values, a rearrangement a; — > t 
is completely specified by its support, the set R of variables which are different in the initial and final configurations. 
A second simplification is specific to the XORSAT problem. Consider an initial solution g_ and the configuration r 
obtained by flipping the variables in R. This second configuration is also a solution if and only if for each constraint 
a, an even (possibly) null number of variables of da are in R. A rearrangement for the variable i is hence a set R 
verifying this condition and containing i. The m.s.r. rii is the minimal cardinality of such a set of variables; on a tree 
this minimum can always be achieved requiring that each a contains either zero or two (and not an higher even value) 
variables of R. The recursive strategy for the computation of rii and the construction of a rearrangement of this size 
amounts to constructing a m.s.r. Ra~,i for all the branches Fa~,i around i (their sizes being denoted 1 + na~*i) and 
to combining the rearrangements of the sub-factor graphs, R = {i\ UaGSi Ra^i- To construct Ra—>i one has to choose 
exactly one variable j d da\i that minimizes the cost rij^a of the rearrangement in the branch Fj^a ■ Summarizing 
this reasoning in formulas, we obtain: 



ria— >i — mm Uj 

j&da\i 



= 1 + ^ rib^i , = 1 + ^ Ua^i ■ (26) 



The reader will easily verify that the equations (j7|8|9ll0p of the general formalism reduce indeed to this simple form, 
noting in particular that the m.s.r. is here independent of the initial configuration, as appears clearly from the 
geometric characterization of the optimal supports R. 

2. Random tree 

This independence with respect to the initial configuration allows to skip the second step of the general formalism, 
as for a given tree the distribution of the m.s.r. is trivially concentrated on a single integer, and to study directly 
the ensemble of random tree formula. We shall follow the general definition of given above, with a Poisson law of 
parameter ak for the branching probability pi, and all constraint nodes of degree k. For definiteness one can assume 
that the boundary condition is free (no bias on the leaves of the tree) and that Ja = ±1 with probability one half; 
these last two choices are in fact irrelevant, as the m.s.r. depends only on the geometry of the factor graph. 

This random ensemble induces a probability law gi^^ for the m.s.r. of the root of T^, and an associated law gl^' 
for the message sent to the root of T^. Simplifying the equations (|22l23l24p of the general formalism, or interpreting 



We do not claim that (rii,n^^\ . . . ,n[''^) is drawn according to V {rDq^}^^^^ . . . q'^^^^ , i.e. that the nf^ are independent conditionally on 
rii, which is not true. The algorithm induces correlations between the various values of a, yet these are irrelevant for the linear averages 
we compute. 
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the specific ones ((26|) in a distributional sense, leads to 

o<^) - V o(^) a'-^^ (5 • r 1 (271 



oo 



■in ^ ^1 / , Hni ■■■Hni "n,l+riiH hrii j V^Oj 

i=0 ' ni,...,n; 

with the initial condition q^~^'^ = Sn^i. 

These equations can be solved by a simplified version of the population dynamics algorithm introduced in the general 
case. The distributions gi^^ and qi^^ are represented by samples of integers {rii}, each element of the population 
associated to qi^~^^^ is generated by drawing a Poisson distributed integer I, extracting at random I elements of the 
sample representing ql^^ and computing their sum plus one. Conversely the elements of qi^^ are the minimum of fc — 1 
randomly chosen integers drawn from the population encoding qi^^. In the following we shall be interested in the 
L ^ (X limit, which is the counterpart of the N oo thermodynamic limit of the original random graph ensembles. 
One could reach it numerically by repeated iterations of the population dynamics step. There is however a simpler 
numerical method which allows to perform analytically this limit. 

Let us first define the integrated version of the m.s.r.d., 

Qi'' - E '^i' ^ ' (29) 

n' >n 

which gives the probabihty of a m.s.r. being larger than n. A few simple properties follow from this definition, 

qi^^ ^ Qi^-^ ~ Qi%, Qi^) = 1 - E '^i' ^ limQ,l^) = 0. (30) 

^ — ^ n— *oo 

n' <n 

A slightly less obvious property is that, for a fixed value of n, qI^"* is monotonously increasing with L. This arises 
from the fact that larger trees have larger rearrangements, and can be proven from (|27l28p via a standard stochastic 

domination argument [30l |. Being moreover bounded from above by 1, Q^^^ converges as L goes to infinity, to a limit 

we shall denote Qn- By continuity in the first equality of ()30p the limit g„ of qi!"^ also exists; same statements apply 

to Qn and g„- Eq- (123) can be rewritten as qI^^ ~ (QI!^^)''^^, in the infinite L limit we thus obtain: 

Qn - Qt' , (31) 

Qn — E E 9"! <^n,l+niH hri; ■ (32) 

i=0 ' ni,...,n( 

These limit distributions can now be determined by a recursion on n. Eq. (USD implies that gj^^ = e""'' for aU i > 1; 
hence qi = e^"*^, which fixes the starting point of the recursion. Assume g„ has been computed upto rank m. This 
means that Qn = 1 — J2n'<n 9"' known upto rank m + 1, and the same is true for Qn because of Eq. (I^T]) . We thus 
have at our disposal the values of §"„ upto n — m, which allows the computation of qm+i through Eq. ([5^ . We defer 
the presentation of the numerical results obtained in this way until Section IIVI in order to confront them with the 
COL and SAT problems. 

Let us only anticipate one feature by emphasizing that the limit L oo was taken here at a fixed value of n. We 
shall see that for some values of a the limits L,n oo do not commute, a situation reminiscent of a percolating 
regime. In such cases Qn tends for large n to a strictly positive value cj), qn is not normalized anymore and cannot 
be directly considered as the distribution of an integer random variable n. It will be however convenient to formally 
consider n as an extended integer, with a probability of being infinite. 

C. g-COL 

1. Given tree, given a_ 



The second example of CSP we shall consider is the g-coloring problem. The variables cr^ can take one of the q 
values (colors) in {1, . . . , g}, and the constraint node a linking two variables i and j forbids the configurations with 
(Tj = aj. The solutions of this CSP are thus the proper colorings of the underlying graph. 
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At variance with the XORSAT problem, the m.s.r. does depend on the initial satisfying assignment: take for 
instance a small graph made of a central site i with q — 1 neighbors. If in the initial coloring all the peripheral sites 
have distinct colors, the minimal size to rearrange i is two. Otherwise, if at least two peripheral sites have the same 
color, there is one color available for the central site to be rearranged without modifying its neighborhood. 

There is however room for simplifications with respect to the general formalism. Consider the constraint a between 
two adjacent vertices i and j. The vectorial message na^i{g_a^i) has only one non-zero component, corresponding 
to the perturbation (7^ — s- = aj. This is a formal consequence of Eq. ([5]), but has a very intuitive meaning: in the 
cavity graph Fa^i the root ai can be given any value Ti ^ aj without having to propagate the rearrangement. We can 
thus get rid of the vectorial character of the messages. Note also that the information contained in the messages Ua-ti 



and 



is redundant, as each constraint node involves only two variables. We shall thus eliminate the variable to 



constraint messages, and rename nj^i{a_^_^^) what was denoted in the general formalism [na^i{g_g^- 
Eqs. (|7I8I9I10P with these new notations, we obtain 



Simplifying 



1 + min 



(33) 



kedj\i 



^iinL) = 1 + min 



jedi 



(34) 



with nj^i{aj) = 1 if j is a leaf of the tree. The interpretation of these equations is clear: to modify the color ct^ of a 
vertex i in a coloring g_ one has to probe the q — I possibilities of ^ (Ti, and follow the effect of this modification in 
the branches Fj^i that become unsatisfied, i.e. those who had (jj = before the modification. 

2. Given tree, distribution with respect to a_ 

We shall study in the coloring case the distribution of the m.s.r. with respect to the measure fi{ff) uniform on 
the proper colorings. In other words we use a free boundary condition and do not impose any external field on the 
leaves. This choice preserves the permutation symmetry among colors, which implies that the marginal distribution 
/i(<7i) of any variable i is uniform over the q possible values. Once the color of an arbitrary root variable i has been 
chosen, the generation of the remaining sites can be done in a recursive way: the colors of the neighbors of i are 
drawn independently, uniformly over the (? — 1 colors distinct from CTj, and this process is repeated from i outwards. 
Exploiting this symmetry and the recursions p3l34p . one finds that the distributions of the m.s.r. with respect to the 
uniform choice of the initial proper coloring is given by 



1 



{q-l)\9^ 



n = 1 + min 



where the distributions of the messages on the edges of the tree are solutions of 



(35) 



kedj\i 



(36) 



with the boundary condition qn^^^ = (^n.i when j is a leaf. 



3. Average over the choice of the tree 

We now consider the ensemble of random trees where the variable nodes have a Poissonian branching probability 
of mean c, and all constraint nodes are identical, ?A(cri, Oj) — l{cri ^ aj). One can easily show from Eqs. ()35I36|) that 
the m.s.r.d. for uniformly distributed initial proper colorings, averaged over this random tree ensemble is given by 



1=0 



-cj 



1 



II [q-iy 



1 

E 

,...,cri— 2 n 



E 



mm 

•=2,...,g 



.1=1 



(37) 
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with qlt~'^^ = 5n,i- This equation could be solved following the population dynamics approach explained above. One 
can however unveil a formal equivalence with the computation performed for the XORSAT problem. Consider indeed 
the random variables l„ which counts in Eq. (P7|) the number of di's assigned to the value a. Conditional on / the 
Zct's are multinomially distributed; as / is itself a Poisson random variable the l„ turn out to be independent Poisson 
random variables. This allows to rewrite Eq. ((37|) as 



h + --- + la 



^n^^'^^ i^rrri ^ '5„,^i„[m,,...,„,] J| ( ^ gi^'...g2i<5„^,i+„i+...+„^. | . (38) 

l2,---,lq=0 ' 1712, ■■■,mq <y = 2 

Comparing with Eqs. (|27l28p one realizes that the solution of the coloring case can be directly read off from the study 
of the XORSAT one with a simple translation of the parameters, 



(39) 



9(9-1 

In particular the simple recursion on n to solve directly in the L ^ oo limit is still applicable to the coloring problem. 

D. fc-SAT 

1. Given tree, given a 

We consider now the third example of CSP, in which the factor graph encodes a /c-satisfiability formula. The 
boolean variables are represented by Ising spins ai — ±1; each constraint node a is linked to k variable nodes, and is 
unsatisfied if and only if these k variable all takes their unsatisfying value, ai — Jf for all i G da. We shall denote 
d+i{a) (resp. d-i{a)) the set of clauses in 9z \ a agreeing (resp. disagreeing) with a on the satisfying value of ct^. We 
also denote d^i the set of clauses in di which are satisfied by ai — a. 

Because of the boolean nature of the variables a rearrangement is specified by the set of variables to be flipped 
(recall the discussion of the XORSAT problem), we can get rid of the vectorial character of the general formalism and 
denote, for instance, ni{q) for the m.s.r. of the variable i under the perturbation — > = ~ai. This quantity does 
depend on the initial satisfying assignment. In the simplest case where there is one single constraint node a in the 
factor graph, ^^(ct) = 2 if a was satisfied only by i before its flip, ni{a) = 1 for all the other satisfying assignments. 
Generalizing this observation to generic factor graphs, one reduces the recursion relations of the general formalism 
(see Eqs. (I7l8l9ll0p ) to: 

{min nj^a{(yj ,a_i^^) if cr,; = — ,/f and cr,- — Jf \fj G 9a \ i 
jeda\i , (40) 

otherwise 

ni^aicr^,a^^a) = ^ + XI ^b^^i^i^^b^^) ^ (41) 

Ms.) = 1 + X! "'^-4^»,Sa^J . (42) 

with again ni^a(o'i) — 1 for the leaves of the graph. 

2. Given tree, distribution with respect to g_ 

We now consider the probability law ii{a) on the initial satisfying assignments, with external fields on some of the 
leaves of the graph. More precisely, we use the form (fTTj) , with the biases on a subset B of the leaves parameterized 
by a real hcxt,i- 

1 + critanh/iext,i 

??oxt,i(o-i) = t; ■ (43) 



The messages Va^i and r]i^a are probability laws of Ising spins and can thus be parameterized by a single real. To 
simplify the notations we make a gauge transformation with respect to the value of the variable satisfying the clause 
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and define 



/ N 1 - J* CTj tanhu„ 

l'a^^[T^) = 



, , 1 — J'cTi tanh hi 
ri,^a{<Ji) = 



(44) 



With these conventions Eqs. (|12ll3p become 



y'a—^i 



f{{hj^a}jeda\i) , f{hi 



bed-i{a) 



fc-1 



1 — tanh hi 



,hk-i) = -hnll-l[ 
\ 1=1 



(45) 
(46) 



with hi-ta = —Jihoxt,i if i is a leaf in B, if it is a leaf not in B. The solution of these equations allows to compute 
the two quantities that we shall need below: 

• the marginal law of (7^, 



1 + (T, tanh hi 



Ua^t - ^ Ua- 



(47) 



• the probability that, conditional on ai satisfying the constraint a, all other variables in da take their wrong 
values, 



(48) 



j€da\i 



We now proceed with the introduction of the distributions q^^^^''^'^ (resp. gi'^''''^'^) of the messages na—,i{<Ji,g_a—>i) 
(resp. 7ii_>a(cri, CTj_,^)) when g_a^i (resp. ct^^^)) is drawn conditionally on (7^. In fact for each directed edge the 
distribution corresponding to one of the two values of Ui can be discarded. Consider first the cavity factor graph 
Fa^i- If is drawn conditionally on ai not satisfying constraint a, necessarily one of the k — 1 other variables 

of a will satisfy it so that ai can be flipped without propagating the rearrangement further in the branch. This is 



5n,o, we shall thus simplify notation and write $1°^*^ instead of qi^^^''^' 



translated in formula as qi"^*''^' ^ 
for the only non-trivial size distribution born by the edge a — > i. This last quantity, in virtue of Eq. (jlSp . has to be 

expressed in terms of the distributions qli^"'''^''' for j E da\i. However the rearrangement has to be propagated only 



if none of these variables were satisfying constraint a, we can thus rename qn 
qn ' ' "^^^ ■ Collecting these various observations we obtain 



Qn 



and forget about 



n E '^".^ 



1 n ^-^^f"^- u„.o 



nl — tanh hj^a 
9 



jGda\inj^a \_ \ jeda\i 

bed-i{a) nt^i 



, (49) 
(50) 



with qii 



6n,i on the leaves. Finally the law of the m.s.r. for i is given by 



<7 aGdfrina^i 



(51) 



3. Average over the choice of the tree 



We shall study random trees with a Poissonian law of mean ak for the branching probability pi of variable 
nodes. The constraint nodes are all of degree k with the signs J° of the unsatisfying literals i.i.d. random variables 
equal to ±1 with equal probability. This implies that the cardinality of the neighborhoods 9+1 and d^i of the root are 
two independent Poisson random variables of mean ak/2, whose law shall be denoted pi^^i_. The same statement is 
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true for d+i{a) and d-i{a) in the bulk of the tree. The last element defining is the distribution V{h) for the biases 
on the leaves of depth L of the tree. Following the general formalism we assume this distribution to be stationary 
under the iterations 



/+ / 

U 



f{hi,...,hk-i) , /i = ^«+-^u-, (52) 



i=l 1=1 



where l± are drawn from pi^,i_ and the hi (resp. the uf ) are independent copies drawn from V{h) (resp. V(u)). 

The computation proceeds with the introduction of q^\h) ( resp. ^ ^ (u)), the average of the gn^"^ (resp. gl"^*'*) 
conditioned by the event hi^a = h (resp. Ua^i = u). The generic equations (|22l23p translate into 

» fe-i 

q!i,'^\u)V{u) ^ J l[dr{h,) 6{u-f{hi,...,hk-i)) (53) 



k-l 



^ 1 — tanh h 



rai,...,nfc_i i=l 

1+ I- 



^n,min[ni ,. . . ,71;^ — ij 



1 - 11 ^ '5„,o + 11 ^ 5„ 

\ 1=1 J \i=i J 

oo /■ '+ / '~ \ 

\K)V(h) ^ Y.Vi,^\^dV{ul)Y^dV{uT)Ah-Y^u] + Y.''^\ (54) 

i+,i_=0 i=l 4=1 \ j=l i=l / 

n\ ,...,7ii_ z=l 

with ql^~'^\h) = 6n,i- Finally the sought-for average m.s.r.d. for the root of reads 

(^i) = [ dV{h) (1 - tanh/i) 9(,^)(/i) , (55) 



which is obtained from (jSip by using the statistical equivalence between positive and negative literals. This implies 

in particular that h has a symmetric distribution, so that gi^'' is well normalized. 

The adaptation of the general population dynamics algorithm to this case is simple. The joint distribution 
q^\h)'P{h) is represented by a sample of couples {{hi,ni)}-^-^, initialized with Ui ^ I and the hi's distributed 
according to V{h) (thanks to preliminary population dynamics steps). The recursion over L amounts to generating 
a sample {{uj,nj)}, where for each j one selects fc — 1 indices ii, . . . ,ik-i in [1,7\/]. uj is set to f(hi-^, . . . ,hi^_^), 
rij to the minimum of {n^j, . . . ,ni^_-^} with probability 1 — exp[— 2itj], to otherwise. In the converse step for each 
new element i two Poisson integers l± of mean ak/2 are independently drawn, then two sets of indices J+ and J_ of 
cardinalities Z+ and are generated, hi is given by '^j ~ "i' while Ui reads 1 + From ([55)1 

we obtain g^ as a weighted histogram of the population. 



1 ^ 

gi"^^ - ^ Y.{1 - tanh/i,) 5„,„, . (56) 



1=1 



The large L limit is obtained by repeating a sufficient number of these steps to achieve convergence within numerical 
precision. 



IV. THE FREEZING TRANSITION IN RANDOM TREE ENSEMBLES 



In the previous section we have established numerical procedures to compute the average m.s.r.d. g„ for the various 
random tree ensembles, based either on a simple recurrence over n for the XORSAT and COL case, or on a more 
elaborate population dynamics algorithm for SAT. We now discuss the outcome of these computations, the limit of 
infinite depth trees (L — > oo) being kept implicit. 

In Fig. [3] we have plotted the integrated distribution Q„, for various values of a, in the XORSAT and SAT case. 
These two families of curves present the same striking feature: when a is increased Q„ develops a plateau, in other 
words q„ becomes bimodal with a positive fraction of rearrangements shifting towards larger and larger values. When 
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FIG. 3: Integrated average distribution of minimal size rearrangements in tree ensembles. Left: random 3-XORSAT, from 
left to right a — 0.4, 0.7, 0.78, 0.8, 0.81, 0.815, Op. The dashed horizontal line is the order parameter at the transition, 
(j>p ^ 0.715332. Right: random 3-SAT, from left to right, a = 3, 4, 4.3, 4.36, 4.39, 4.40, 4.41. The dashed line indicates 
0p ^ 0.74. 



a critical value ap is reached the length of the plateau becomes infinite. This transition is thus described by the order 
parameter 4> = hm„^oo Q^, which represents the fraction of percolating optimal rearrangements whose size diverge 
with L. From the point of view of the order parameter the transition is discontinuous, ((> jumps from to a positive 



value (^p when the threshold ap is crossed. 



Let us follow the interpretation suggested at the end of Sec. IlIIB 21 of Qn being the distribution of an extended 
integer which has probability of being infinite. With the rules that the minimum of several such extended integers 
is infinite if and only if each of them is infinite, while their sum is infinite as soon as one of them is so, Eqs. (|31l32p 
imply in the XORSAT case 



1 - exp[-afc(^] , (57) 



where we denoted = limQ„. This can be closed under the form 0=1 — exp[— afc(/)'^~^], with ap being the smallest 
value of a for which there exists a non-trivial solution. At ap this solution appears discontinuously, with the positive 
value 4>p corresponding to the height of the plateau in the curves of Qn- For larger values of a there are two non-trivial 
solutions, the relevant one being the largest. Numerical values of ap and 0p are given in Table[T]for a few values of k. 

Thanks to the formal equivalence between XORSAT and COL summarized in Eq. (|39p we immediately obtain the 
equation on the order parameter of the COL freezing transition and the critical value Cp (see also Table |I] for their 
numerical values), 

0= (l-exp[-^-^]y"' , c(coi^)M=,(,-l)a(^™)[fc = ,] . (58) 

The initiated reader will recognize the order parameter as the fraction of hard fields in the solution of the IRSB 
equations at m = 1 given in [21|; we shall come back on this point later on. 

The determination of the threshold ap is slightly more involved in the SAT problem. We have indeed a family of 
distributions qn{h), Qniu) indexed by a real h, u\ it it thus necessary to define for each of them an order parameter 
(j){h)^ (t>{u), as the fraction of infinite values of n born by qnih), qn{u). The equivalent of Eq. ([57]) takes now a 
functional form easily derived from Eqs. (|53l54p . 

k—l k—1 

mnu) = / n ^^e^O 5{u - f{hu hk-i)) n ^—^f^Hh.) , (59) 

i=l i=l 

oo 1+ I- / '+ \ / /- \ 

Hh)Vih) = J2 / n ^^K^) n '^^K") M ^ - E + E 1 - 11(1 - 0(^.^)) . (60) 



From the solution of these equations the order parameter of the average m.s.r.d. is obtained (see Eq. ([551)) as 
(j) = J d'P{h){l — tanh h)(j){h). Again, is the fraction of hard fields in the m = 1 IRSB equations of [22|, this 
connection shall be discussed further in Sec. IVICI and App. [B] A solution of the functional equation on (p{h) can be 
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XORSAT 


COL 


XORSAT and COL 


SAT 


k,q 


Qp 


<t>p 


Cp 


0P 


A 


a 


b 


u 


Op 


<^p 


A 


a 


b 


V 


3 


0.818469 


0.715332 


4.910815 


0.511700 


0.397953 


0.422096 


1.221834 


1.593787 


4.40 


0.74 


0.55 


0.38 


0.90 


1.87 


4 


0.772280 


0.851001 


9.267358 


0.616297 


0.350174 


0.433412 


1.341647 


1.526313 


10.55 


0.86 


0.40 


0.42 


1.22 


1.60 


5 


0.701780 


0.903350 


14.035605 


0.665924 


0.320971 


0.439997 


1.421808 


1.488035 


21.22 


0.91 


0.33 


0.44 


1.40 


1.50 


6 


0.637081 


0.930080 


19.112434 


0.695986 


0.300707 


0.444431 


1.481191 


1.462601 


39.87 


0.93 


0.31 


0.44 


1.45 


1.47 


7 


0.581775 


0.945975 


24.434557 


0.716600 


0.285554 


0.447677 


1.527913 


1.444121 














8 


0.534997 


0.956381 


29.959848 


0.731841 


0.273649 


0.450187 


1.566174 


1.429899 














9 


0.495255 


0.963661 


35.658363 


0.743697 


0.263961 


0.452205 


1.598411 


1.418505 














10 


0.461197 


0.969008 


41.507763 


0.753261 


0.255868 


0.453873 


1.626162 


1.409102 















TABLE I: Threshold, order parameter and critical exponents for the freezing transition in random tree ensembles. 



sought by a population dynamics algorithm: the distribution V(h) being represented by a sample {/li}, we associate to 
each of them an estimation 0^ of <i>{hi) and consider a population of couples {(/li, From this a new population 

{wj, is generated according to Eq. (j59p : for each element of the new population fc — 1 indices ii, . . . ,«fc-i 

are chosen uniformly at random in [1, A/] and the new couple (wj, <^j) is computed as 

7 \ / c/i TT 1 ^ tanh/i,; \ 
(wj,</)j)= \^(h,^,...,h,^_^), _[_[ . (61) 



In turns the sample is generated from the {Mi,(/)i}'s according to (pO)) . and an estimation for the order 

parameter is computed as 

1 ^ 

0= -^^(1 -tanh/i,)'/>.: ■ (62) 

i—\ 

These two steps are iterated a large number of times, starting with the initial condition </'(/i) = 1, i.e. (^i — \ for all 
elements of the initial population. For small values of a the function converges to upon these iterations, while 
for larger values a non-trivial fixed point is reached. The numerical estimation of the threshold ap separating these 
two regimes, along with the deduced order parameter at the transition, are presented in Table IH The precision on 
these numbers is rather low; indeed, strong finite M effects make difficult a precise determination of the discontinuous 
disappearance of the non-trivial solution. Moreover the numerical method becomes difficult for large values of fc, 
hence the limitation of the results presented to fc S [3, 6]. For fc 3 (/)p can also be successfully compared on the right 
part of Fig. [3] with the plateau in the numerically determined Q„. 

The discontinuous character of the transition exhibited by the jump of the order parameter should not hide the 
strong precursor effects, usually associated to continuous transitions, present in the low connectivity phase. The 
existence of a diverging scale of rearrangement sizes is indeed obvious on Fig. [3l One can for instance define n^ioL) 
as the point where Qn crosses a threshold e. This scale ne(a) diverges at ap (as long as < e < 0p), in other words 
arbitrary large rearrangements are present with positive probability sufficiently close to the transition. A detailed 
study of the XORSAT problem [26|, drawing on a formal analogy with the mode-coupling theory of supercooled 
liquids [Slj, revealed that the divergence of is algebraic, ~ (ap — a)^'^ . This exponent v is the solution of an 
universal type of relations, 

_ 1 1 r^(i-a) _ r^(i + &) _ , 

2a + 2&' r(l-2a)"r(l + 26)" ' ^^^^ 

where T denotes Euler's special function (see Fig. |4]) and A a fc-dependent parameter in [0,1]. In fact a and h are 
also critical exponents governing the asymptotic behavior of Qn around its plateau, see App. |^ for details. The 
non-universal parameter A was found [26| to be, in the XORSAT case, 

;^(XORSAT) ^ ^ ^ . (64) 

apfc(fc - l)(/.^"i 



Numerical values of this parameter and the associated exponents a, 6, v can be found in Table H] Because of Eq. 
the exponents for the q-coloring are exactly the same as the one of fc- XORSAT, provided one identifies fc and q. It 
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FIG. 4: The exponent a (respectively 
Eq. 



-b) is the positive (respectively negative) root of the equation represented here, see 



will be useful for future discussion to rewrite the parameter A under the form 



(65) 



The asymptotic behavior of the distribution qn for SAT could be a priori more complicated, because of the underlying 
infinity of distributions qn{h). We shall however argue in App. \K\ that the phenomenology remains the same, in 
particular the exponents a, b and v are still given by Eq. (j63p . The parameter A is now 



j^(SAT) 



2'=(fc-2) 
apk{k — l)(j)p 



(66) 



the expression (j64p being only modified by a scale factor 2*^ on the connectivity. The value of A can thus be determined 
from the numerical evaluation of ap and 0p explained above (see Table [J for the results). The technical details of the 
analysis, along with numerical evidence supporting it, can be found in App. [Xl 



A DIGRESSION ABOUT THE RECONSTRUCTION PROBLEM 



It is instructive, and shall be useful for the discussion of the following Section, to reconsider the freezing transition 
from a slightly different perspective, namely the problem of tree reconstruction ^32i] . For simplicity we consider first 
the g-colorings of regular trees with L + 1 generations, where every vertex has degree I + 1 (apart from the root 
which has degree I and from the leaves of degree 1). The generation of an uniform proper coloring can be seen as a 
broadcasting process: the color of the root being chosen, each of its sons has a color uniformly chosen among the q—1 
other ones, and this is propagated until the leaves of depth L have been reached. In an information theoretic vision 
the color of the root is an information transmitted through a noisy channel, the tree. The reconstruction problem 
consists in infering the color of the root given the observation of the colors of the leaves, while the rest of the coloring 
is hidden to the observer. Depending on the values of {I, q) a correlation between the color of the root and the one 
of the leaves survives or not the limit L — > oo. An optimal algorithm will be able to infer the value of the root from 
the observation of the leaves with a probability of success larger than the one of a random uniform guess if and only 
if this correlation remains positive. In this case the reconstruction problem is said to be solvable, which can also be 
formulated as the non-extremality (or impurity) of the free-boundary Gibbs measure [ssj on the infinite tree. On 
general grounds one expects a critical value ld{q) separating a solvable regime for I > ld{q) and an unsolvable one 
when I < ld{q)- The values Zd(3) = 5, ?d(4) = 8, ld{5) = 13 and /d(6) = 17 have been conjectured in [sS], along with 
rigorous bounds ld{3) < 5, ;d(4) < 9, /d(5) < 13 and ld{6) < 17. 

A very naive, suboptimal algorithm to perform this inference proceeds from the leaves towards the root, according 
to the following rule: if the set of colors on the descendants of a vertex i contains q — 1 distinct elements in [l,q], the 
remaining color is assigned to the vertex ai. Otherwise it is assigned a neutral color, say white, (Ti = 0. It is easy 
to realize that at the end of the execution of this algorithm, starting from the observation of the leaves of a proper 
coloring, the vertices in the interior of the tree are either white or have been assigned the correct value they had in 
the initial coloring. What is the probability (f>L (with respect to the choice of the initial coloring) that the root is 
correctly reconstructed in such a way? For this to be possible, q—1 distinct colors had to be assigned to its sons in the 
initial coloring, and for each color at least one of them had to be correctly reconstructed. can thus be determined 
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by recurrence according to 4>l+i = V{4'l), with 

and 4>L=o = 1- The Hmit oi 4>l for large L is the largest solution of the fixed-point equation cj) = V{(j)) on the interval 
[0, 1]. Depending on the values of q and I this limit is either zero (for instance V vanishes identically if Z < g — 1 as 
there are not enough descendants for the root to be fully constrained) or strictly positive. By numerical inspection of 
Eq. (p7|) we found the latter case to happen when I > lp{q), with Zp(3) = 5, ?p(4) = 9, ^p(5) = 14 and Zp(6) = 19. This 
means that the algorithm has a positive probability of guessing correctly the root from the observation of arbitrarily 
distant leaves when I > lp{q), whereas it is doomed to fail if I < lp{q)- The reasoning presented here is essentially a 
constructive proof of the bound (?) < ^p (q) , weaker yet conceptually much simpler than the rigorous bound of [3J] . 
Let us underline that such a reconstruction procedure is far from optimal; we only retain the information given by 
a drastic event, when the color of a vertex is unambiguously determined by its descendents, and discard the cases 
where one color is only more probable than the others. 

This naive reconstruction algorithm is in a sense dual with the main subject of the paper: it correctly infers the 
color of the root if and only in all proper colorings with the observed assignment of the leaves the root takes always 
the same color. In other words in all rearrangements (not necessarily of minimum size) for the root starting from 
the initial coloring at least one site on the boundary of the tree has to be rearranged. This can be determined using 
the recursion relation on the sizes of the minimal rearrangements (for instance (|33l34p in the case of the coloring) 
with a different boundary condition, rii^j — oo when i is a leaf of the tree. The value computed with this 
boundary condition will be infinite if there are no rearrangements of the root which can avoid rearranging the leaves 
(the algorithm is successful), finite otherwise (the root is white at the end of the algorithm). This difference in the 
boundary condition {rii^j = oo vs 1) is irrelevant in the large L limit: m.s.r. of finite size have supports of finite 
depth, hence are not affected by the boundary when L gets larger than this depth, while m.s.r. of sizes growing with 
L are correctly assigned their formal infinite size in this way. 

To summarize the connection between this Section and the rest of the paper, the constraints that imply large 
rearrangements are precisely the information exploited in the naive reconstruction procedure. The probability of 
success of the algorithm on arbitrarily large trees can thus be identified with the order parameter of the freezing 
transition introduced in the previous Section. This identification holds for generic CSPs on random trees, provided 
one averages the success probability of the naive reconstruction over the ensemble of trees. Another suggestive 
perspective on the problem is given in terms of percolation. The order parameter can indeed be viewed as the 
probability of percolation of the support of the rearrangement from the root to an infinitely distant boundary. In 
the case of XORSAT this percolation is purely geometrical and corresponds to the existence of an infinite subtree 
where all variable nodes have degree greater than two. For COL and SAT the object which percolates is subtler: the 
rearrangements depend both on the geometry of the factor graph and on its initial solution. 



VI. FROM RANDOM TREES TO RANDOM GRAPHS 
A. Local and global aspects of the cavity method 

We turn now to the more delicate issue of the validity of the results derived in the random tree ensembles for the 
original random graphs. As mentioned in Sec. |TT]the latter have a local tree structure, with high probability in the 
thermodynamic limit. The point thus amounts to giving a description of the boundary condition induced by the rest 
of the factor graph. We shall handle this problem in the framework of the cavity method [ll|, [1^ for sparse random 
graphs [1^ (see also [13] for a related discussion) that we briefiy survey below. 

Consider a G{N^ M)^ random factor graph F with N variable nodes and AI = aN constraint nodes of degree k, 
the associated random measure on , 

1 ^ 

A*(^)-^n^«fe-) ' (68) 



^ random hypergraphs with arbitrary degree distributions can be studied similarly. 
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and suppose the weights tpa are i.i.d. positive random functions on X'^ (not necessarily {0, 1} valued). 

Two kind of intertwined properties of the model can be investigated: thermodynamic (global) ones, with the 
characterization of the random variable Z, and local ones, concerning the behavior of the measure /x itself. Because of 
the self-averaging properties of In Z for large graphs the central thermodynamic quantity is the quenched free-entropy 
density, 

$ = lim ^E\nZ . (69) 

The latter aspect of the problem, which is the important one for our present concerns, can be formulated as follows. 
Call Fl the sub-factor graph induced in F by the variable nodes at a graph distance smaller than L from an arbitrary 
root z, and the configuration of these variable nodes. As we are interested in the thermodynamic limit with L 
finite we can assume without harm that F^ is a tree. The marginalization of Eq. (|68p leads to a law hl for ct^; it can 
be seen as a random measure, conditioning _F on a given realization of F^, because of the choices in F \ F^. At this 
point a question arises naturally: what is the (weak) limit oi fiL when the thermodynamic limit is taken? 

The cavity method provides a series of possible answers to this question, and an heuristic to choose the right one. 
Let us introduce some notations: we denote b the number of sites in the boundary B made of the sites at distance 
exactly L from i, B — {zi, . . . ,ib}, and define a measure on aj^ with external fields 77^ (probability measures on X) 
acting on this boundary: 

1 ^ 
Zoim,---,Vb) 

where Zq ensures the normalization of the law. 

The statement of the simplest (so-called Replica- Symmetric, RS) situation described by the cavity method is 



ml(-)^m^"^(-;^i,...,%) , (71) 

where the rji are i.i.d. from a distribution 'P(o)- Roughly speaking, this is true when /i is a (finite size) pure state, so 
that the effect oi F\Fl on the boundary variables can be factorized. In more complicated situations there is a large 
number of pure states on which the Gibbs measure has to be decomposed for this factorization to be possible®. We 
shall thus introduce a new measure on aj^ as a weighted superposition of different /i'-'^-', 

m(i)(^l;^i^'\ • ■ ■,Pl;'\m) = ^ j f{dP^'\^,) /.(°)(a^;r;i, . . . ,,y,) Z,{^,, . . .,r^,r ■ (72) 

Zi[Pi ,---,Pb ■,m] J 

In this definition m G [0, 1] is known as the Parisi breaking parameter, the i^/^^'s are distributions of fields, and again 
Zi is a normalization. The hypothesis of the cavity method at the level of one step of Replica-Symmetry Breaking 
(IRSB) reads 

^,L{■)^^^^'H■■:P^'\■■■,Pl'\m) , (73) 

where the P^^^^ are i.i.d. from a distribution V(^iy In some cases the IRSB description coincides with the RS one, for 

instance whenever the p/^"* in the support of 7^(1) are concentrated on a single value of the field (in the following we 
shall call this a trivial IRSB solution). A less obvious reduction happens when the parameter m is equal to 1: from 
(|70l72p one realizes that in this case fj,^^^ is indistiguinshable from /Lt*^"^ with properly averaged values of the external 
fields, more precisely 

fi^'\aL;Pi'\...,Pi'\m^l)^fi^'\a^;r]„...,f],) with % ^ j dP^'\ij) . (74) 

This IRSB formalism can be promoted to an arbitrary level of symmetry breaking by a recursive construction. Let 
us call A^o the set of probability laws on X, and define by recurrence Mk+i as the set of probability laws on Mk- 



We skip the intermediate case of a finite number of pure states; for instance the low temperature phase of an Ising ferromagnet should 
be described by the superposition of the two /i'''^ of positive and negative magnetization. 
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The measure fi^^^ with K steps of rephca symmetry breaking is parameterized by K reals < nii < ■ ■ < rriK < 1 
and b elements P^^^ oi A4k, and can be expressed recursively as 



•' i=i 

The iiT-RSB assumption of the cavity method reads 

iid-)^ ^^^"\■■,P[''\■■■,Pi''\m^,...,mK) , (76) 

with the P^^^ i.i.d. from P{k)i a given element of A4k+i- Eventually the limit of an infinite number of steps of replica 
symmetry breaking {K — > oo) can be formally taken. Note that, as discussed in the IRSB case, /^^^^ incorporates 
as special cases (when the distributions concentrates on a single value, or when the m^'s are degenerate) all possible 
descriptions at a smaller level of RSB. 

We face now the problem of choosing, among all these possible assumptions, which is the correct one. A first 
condition on the allowed values of 'P(if) arises from a simple consistency requirement. fiL can indeed be obtained in 
two ways: from a direct application of the statement (j76p . or by considering a larger neighborhood of depth L' > L 
and making a partial marginalization of fiL'- As F^i \ is distributed according to a Galton Watson branching 
process, the consistency of these various ways of obtaining /xl induces conditions restricting the possible values of 
Vi^K)- At the RS {K = 0) level this is nothing but the stationarity property stated in Eq. (|2T|) . The heuristic for the 
choice of K and the values of the breaking parameters arises from the global aspect of the cavity method, namely 
the computation of the typical value of the free-entropy density $. More precisely, for each level of the RSB hierarchy 
there is a functional ['P(i<:), nii, . . . ,TO_ff] whose minimum is taken as an estimation of $. The bounds $ < ^(k) 
have indeed been rigorously proven in some cases [sgI IstI . [s^ , and are expected to hold with a certain generality. 
The best estimation of $, which is presumably exact in mean- field models (this has been proven in one case j59]), 
should thus be sought through the minimization of ^{k) in the formal K ^ oo limit which encompasses all possible 
levels of RSB. The limit of /zl is expected to be described by the set of parameters achieving this minimum (note 
that the extremization of '^{k) with respect to V^k) corresponds to the consistency requirement explained above). 
This minimization is obviously a formidable task which seems out of reach in its full generality for models on sparse 
random graphs. There are however partial arguments which can be used to assess the validity of the simplest RS and 
IRSB hypothesis. The decay of point-to-set correlations at large distance (in other words the purity of the Gibbs 
measure, or the non reconstructibility of the value of a spin from the observation of distant sites) is indeed related to 
the absence of a non-trivial solution of the IRSB consistency equations at m = 1 [3^ . and suggests the RS hypothesis 
to be correct. A test of the plausibility of the IRSB description is usually performed via a local stability analysis [ioj : 
one checks in this way the absence of a non-trivial solution of the 2RSB consistency equations in the vicinity of a 
IRSB solution V(i). 

Let us finally underline the deep connection between these issues and the local weak convergence method developed 
by Aldous (see (4ll for reviews) on related optimization problems. Recently the above stated local properties of 
the RS cavity method were rigorously proven in some discrete models (cf. for instance (43l. H. 14^ ) . under a priori 
non-optimal conditions (worst-case vs typical decay of correlations, i.e. uniqueness vs extremality conditions [20|). 



B. Minimal size rearrangements in the random graph ensembles 



We shall now reconsider the computations of the m.s.r.d. performed in the random tree ensembles in the light of the 
above presented cavity method. It should be clear that the thermodynamic limit [N — > oo) of the average distribution 
qn defined in Eq. ^ for the original random graph ensembles coincide with the infinite L limit of their tree counterpart 
whenever the RS assumption stated in (j71|) is valid. The probability measure on the initial configuration we used for 
the computation of the m.s.r. in finite tree formulae (cf. Eq. (Ilip ) corresponds indeed to the limit measure /i'-"-' on 
the finite neighborhood of the random graphs. The validity of this RS scenario depends on the particular model and 
on the value of the connectivity parameter a (c for coloring) . 

In the case of XORSAT [H, Q the local properties of the uniform measure over the set of solutions are well described 
by the RS assumption upto the satisfiability threshold as, for all values of fc. In consequence the computation of 
qn performed in the random tree ensemble extends to random graphs throughout the satisfiable phase a < as, the 
threshold for the freezing transition in random graphs (at) and in random trees (ap) are equal, and the exponents 
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governing the divergence of the m.s.r. in the hmit a ^ are correctly obtained from ([64)1 . In fact ap corresponds 
also to the clustering transition due to the appearance of an extensive 2-core: a rearrangement for a variable in the 
2-core (more precisely in the backbone 0,0]) is necessarily of extensive size. In agreement with this correspondence, 
the order parameter of the freezing transition solution of (|57l) is precisely the fraction of vertices in the backbone. 

The picture of the satisfiable phase of random fc-SAT and q-COL advocated in [2^, [2l|, |22| is richer. Let us first 
describe it on the example of SAT. At low values of the connectivities, a < ad{k), one expects a plain RS description 
to hold. The clustering transition ad{k) corresponds to the appearance of long-range point-to-set correlations, in 
other words to a non-trivial solution of the IRSB equations with m = 1. In an intermediate regime [ad{k) , ac{k)] 
the thermodynamics of the system is described by a IRSB scenario with m = 1, the dominant clusters of solutions 
are exponentially numerous (their complexity is strictly positive)^. At ac{k) a condensation phenomenon occurs, the 
degeneracy of the thcrmodynamically relevant clusters becomes sub-exponential, and the IRSB breaking parameter 
TO decreases from 1 to as a increases from ac{k) to the satisfiability threshold as{k). Higher levels of RSB might 
be necessary to describe the condensated regime [q;c(^), Q^s(^)]; we shall in the following make the hypothesis (partly 
supported by [i^) that this is not the case for a < ac{k). Because of the equivalence, for the local properties of the 
measure, of an RS description and a IRSB with m — 1 (cf. Eq. ()74p ). we thus expect the computation of the minimal 
size rearrangements performed on the tree to be correct for random SAT formulae with a < adk). For the sake of 
readability we reproduce in Table [TTl the values of ad{k) and ac(fc) obtained in [l^H^], along with the satisfiability 
threshold a^ik) of p^. 

Depending on the values of k the freezing threshold ap{k) for the random tree ensemble is, or not, smaller than the 
condensation one. For k e [3, 5] one finds ap{k) > ac{k): for these values of k the computation in the tree ensemble 
does not allow the determination of the freezing threshold of the original ensembles af{k) (at this point we can just 
say that af{k) > adk)). For fc = 6 the situation is reversed, ap{6) < ac(6), we thus conclude that af(6) = ct.p{Q), 
and that the exponents a, b, v describing the precursors of the freezing transition can be safely computed from (j66p . 
We expect the ordering of the various thresholds, and hence the validity of the conclusions just stated for /c = 6, 
to remain the same for all greater values of k. This is corroborated by an analysis of the large k limit presented in 
App. lA 3] the asymptotic behavior of ap(k) is much smaller than the one of a^k) (20l. [2^. 

aAk) = ap{k) = ^(Infc O(lnlnfc)) < ac(fc) =2*Un2-0(l) . (77) 
k 

In fact the SAT problem in the limit of large k becomes similar to the XORSAT problem: the threshold af{k) = ap{k) 
is equivalent to 2^ times the corresponding value for XORSAT, the order parameter at the transitions are equivalent in 
both problems, hence t he p arameter A governing the critical exponents becomes the same in the large k limit. Moreover 
from the results of [20l.l22j| on the behavior of the clustering threshold one realizes that the regime [Q;d(fc), af{k)] where 
clusters are present yet do not have frozen variables is of vanishing width in this limit. 

The picture of the satisfiable regime for the g-coloring of random graphs presented in [2l| is essentially the same 
as the one of SAT we just described. The dynamical, condensation and satisfiability thresholds obtained in ^2l| are 
recalled in Table HIl (the last two are denoted Cfr{q) and Cq(q) in [2l[). As argued above the computation performed 
in the random tree ensemble should be correct for Poissonian random graphs of mean connectivity c < Cc(q); for 
q G [3,8] this regime does not include the tree freezing transition Cp{q) (called c-[{m = 1) in [2l[). Conversely for 
q > 9 we have Cf(g) — Cp(g), which is given exactly by q{q — 1) times the threshold of XORSAT (recall the formal 
equivalence between XORSAT and the free boundary COL problem stated in Eq. (|39p). and the exponents a, &, v are 
the same as in XORSAT (identifying q and k). This ordering of the thresholds is confirmed by the analysis at large q, 

cdq)=q{\nq + 0{\n\nq))^cdq)^2q\nq-0{\nq) , (78) 

the behavior of Cf (g) being justified in App. I A 31 while the one of the condensation threshold was given in f2l| . 

C. Dealing with RSB 

We have thus reached the frustrating conclusion that the computations performed upto now were not able to 
determine the average m.s.r.d. in the condensated phase of SAT and COL, and in particular for fc g [3, 5], g 6 [3, 8], 
to locate the freezing transition and describe its critical behavior. The presentation of the cavity method of Sec. IVI Al 



The case A: = 3 is special from this point of view, one finds indeed OL^{'i) = q:c{3) and no intermediate phase with an exponential number 
of relevant clusters. 
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COL 


SAT 


k,q 


Cd 


Cf 


Cp 


Cc 


Cs 




Qf 


Qp 


ac 




3 


4 


4.6 


4.911 


4 


4.68 


3.86 




4.40 


3.86 


4.267 


4 


8.35 


8.8 


9.267 


8.4 


8.90 


9.38 




10.55 


9.547 


9.931 


5 


12.83 


13.5 


14.036 


13.2 


13.67 


19.16 




21.22 


20.80 


21.117 


6 


17.64 


18.6 


19.112 


18.4 


18.88 


36.53 


39.87 


43.08 


43.37 


7 


22.70 


24.1 


24.435 


24.0 


24.45 










8 


27.95 


29.93 


29.960 


29.90 


30.33 










9 


33.45 


35.658 


36.0 


36.49 










10 


39.0 


41.508 


42.5 


42.9 











TABLE IL Thresholds for the original random ensembles. The COL values are from l2lll a nd [24| for the satisfiability threshold 
Cs, the SAT ones from and [13] for as. For q G [3, 8] the freezing threshold of]2l[ is computed at the IRSB level. 



indicates clearly what has to be done to remedy this insufficiency: one should reproduce the computations of the 
m.s.r.d. on finite trees, taking for the probability law on the initial configurations /i'-^-' instead of the ^'■"^ we initially 
considered. This generalized computation can in fact be performed in a similar way, at the price of some technical 
complications, and is sketched for the K ^ 1 level of replica symmetry breaking in Appendix [B) The resulting 
equations become rather difficult to solve and we leave the complete determination of the distribution qn as an open 
problem. One can however draw some general observations that we want to underline here. The order parameter of 
the freezing transition, i.e. the fraction of rearrangements of diverging size, corresponds to the probability (over the 
pure states distribution) of a variable being acted on by an hard field which constrains it to a single value. This was 
found above in the three CSP we considered when the freezing transition happens in a IRSB phase with m = 1, and 
will be shown in App. [Blto hold in non trivial situations with to < 1. This should remain true for any CSP and any 
further level of RSB. Another universality statement concerns the critical behavior of the distribution qn around the 
freezing transition af. The phenomenology described by the exponents a, b, v can indeed be argued to persist even 
when Off belongs to the condensated regime [ofc, ols\. Moreover the parameter A fixing the value of the exponents can 
be expressed from the standard RSB computation. The reader will find in App. [B] the technical details leading to this 
conclusion for SAT and COL at the IRSB level, which is also expected to hold for other CSPs and higher levels of 
RSB. 



VII. CONCLUSIONS AND PERSPECTIVES 



One of the main themes of the paper was the distinction that has to be made between the clustering and freezing 
transitions. These can coincide in sufficiently symmetric problems like XORSAT, yet in general the solution space 
gets clustered without variables taking the same value in all elements of the clusters. A definition of the clustering 
threshold ad was put forward in [20j as the smallest connectivity such that the long-range point-to-set correlation 

lim lim EV^(o:a^) V|^(cr,|CTai)-/i(cr01 (79) 

L^oo N^cc ^ — ^ ^ — ^ 

2.0L 

remains positive, where i is an arbitrary variable node and oig^ the configuration of the nodes at graph distance 
exactly L from i. A similar definition of the freezing transition af can be given in terms of the stronger notion of 
correlation 

lim lim E^M(ag^)^%(a,|agi) = 1) , (80) 

hence af > aj. The sub-optimality of the naive reconstruction algorithm given in Sec. fVl should clarify why this 
inequality is in general strict. 

In this paper we concentrated on the rearrangements of finite sizes in the thermodynamic limit, i.e. we computed 
the limit N ~> oo (or i — > cxd in tree ensembles) of the distributions qn at a fixed value of the sizes n. The percolating 
rearrangements thus appeared as formally infinite values of n which had to be included to ensure the normalization 
of the limiting q„. It should be an interesting research problem to describe more precisely these diverging size 
rearrangements by taking a scaling limit of q„, letting n grows with N. The leadin g or der is expected to be linear in 
N, as are the minimal Hamming distances between clusters studied for instance in 



gore 
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The divergence of the minimal size of rearrangements can be viewed as a percolation phenomenon of their supports. 
In the case of XORSAT this is nothing but the classical 2-core percolation of random hypergraphs; for general CSP, 
in particular SAT and COL, the percolating structure is defined in two steps, the factor graph being equipped with 
a measure on the set of initial configurations. The universality of their critical behavior described by the exponents 
a, b, V and the relations (|63[) between them is shared by other similar problems, for instance rigidity [48| and q- 
core |49ll pe rcolation when defined on Bethe lattices. The latter problem is strongly related to kinetically constrained 
models 'SOl] , for which minimal size rearrangements can be also computed and have the same critical behavior [Fill . 

The recursion relations ([7]) could form the basis of new investigations on the structure of a single formula, following 
the line of research pioneered in [l^ . Though there is no guarantee of convergence in the presence of cycles in the factor 
graph, they can be turned into an heuristic message passing algorithm that will provide informations on a solution of a 
given instance of CSP. This solution should be found by an independent solver algorithm, or, as was proposed in [s^ . 
in an incremental way. Starting from an empty formula and an arbitrary assignment of the variables constraints are 
introduced one by one. Whenever the new constraint is violated by the current assignment one rearranges it; in (s^ 
this step was performed by a local search algorithm, that could be replaced by the single sample m.s.r. message 
passing heuristic. 

The study of the rearrangements of XORSAT performed in [26] addressed further issues left apart in the present 
work. One was the characterization of the geometrical properties of the m.s.r., through the distribution of their average 
depths and a measure of their cooperativity by a geometrical susceptibility. We expect some of these geometrical 
results to extend from XORSAT to arbitrary CSPs, in particular the value of the critical exponents C — 1/2, r\ — \ 
(see [1^ for their definitions). Another aspect should on the contrary be much more problem dependent, namely the 
structure of the energy barriers between rearranged configurations. Given a pair of satisfying assignments a;, t one 
can define the set of paths in the configuration space which leads from one to the other by modifying one variable 
at a time, each variable being modified at most once. The barrier between q_ and r can be defined as the minimum 
over this set of paths of the maximum along the path of the number of violated constraints. One can then study 
the rearrangements which modify a given variable i and achieves a minimal value of the barrier between the initial 
and final configurations. The structure of XORSAT is such that minimal barrier and minimal size rearrangements 
do not coincide, and that energy barriers are always strictly positive (unless the variable appears in no constraint, 
otherwise hipping a variable always makes at least one constraint unsatisfied). On the contrary for SAT a finite size 
rearrangement can always be performed remaining in the set of satisfying configurations: one just has to fiip the 
variables in decreasing order with respect to the distance from the root of the rearrangement. 

Let us finally mention that the general formalism can be applied to several CSPs besides the three examples on 
which we concentrated. For instance the bicoloring of random hypergraphs [53| . which admits a stationary free 
boundary, is easily seen to have a freezing transition in random tree ensembles with branching ratio 

^(BICOL) ^ (2'=-! - 1) 0«^AT) (8^) 
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APPENDIX A: CRITICAL BEHAVIOR AROUND THE FREEZING TRANSITION 



1. XORSAT 



In this appendix we shall give some details on the asymptotic behavior of the average m.s.r.d. in the neighborhood 
of the freezing transition in the random tree ensembles. The case of XORSAT was treated in [2^, the main interest 
will thus be in the extension of these results to the SAT problem. For the sake of clarity we first recall briefly some 
of the key points of App. C in [26!|. 

Let us define the generating functions of (?„ and g„ as 

oo oo 

= 9"^" ' -^(^) = E 9n^" • (Al) 

n— 1 n— 1 
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The equations (|28l27p can be rewritten as 

Qn - Qfr' , (A2) 

R{x) — X exp[—ak + akR{x)] . (A3) 

The order parameter = hm„_^oo Qn can also be expressed as R(x = 1); the equation determining (j) is formally 
written as = y{4'i c«) with a) = 1 — exp[—ak<j)''^^]. At the transition point {up, (f>p) we have d^V = 1: the two 
curves become tangent at this point. More explicitly, 

0p = 1 - exp[-apfc0^-i] , (A4) 
1 = apfc(fc - 1)0^^2 gxp[_Q,pfc0j;-i] . (A5) 

Consider first the large n regime right at the transition (a = ap), and assume that the decay of Q„ towards the 
plateau (pp is algebraic, Qn ~ 0p + ^ n^"^, with A a positive constant and a a positive exponent. Expanding Eq. (|A2p 
with this ansatz, we obtain 

Q„ ~ 0^-1 + (fc - + (fc^lKfc-2) ^,„3^, _ ^^^^ 

The properties of generating function (similar to Laplace transforms) leads to algebraic singularities of R and R 
around x = 1 [53 |: 

- s) - 1 - ?!)p - ^ r(l - a)s° , (A7) 

i?(l - s) - 1 - - (fc - 1)0^-2^ r(l - a)s'^ - r(l - 2a)s2- (A8) 

where the equivalent notation hold in the s ^ limit, and T is Euler's special function. Inserting these expressions in 
Eq. (|A3[) . one can expand in powers of s and identify the terms of order s°, s° and s^" on both sides of the equation. 
The first two orders compensate because of, respectively, the relation on the order parameter (jA4[) and its derivative 
(|A5|) . The order fixes the exponent a under the form ((63)) . with A given by ((64)) . 

We now consider the limit a — > ofp and denote S — ap ~ a the (vanishing) distance to the transition. There are two 
scaling regimes to be distinguished; the first governs the behavior of Qn in the neighborhood of the plateau. Suppose 
this regime is reached on a scale ni{6) diverging with S and described by the following scaling function: 

e{t) = hm S-'^^[Qn=tn,iS) - <f>p] ■ (A9) 

o — >0 

Expanding Eqs. (|A2IA3[) order by order in S, one finds similarly (see [2^ for details) that the two first orders are 
satisfied thanks to relations (jA4IA5[) . while the third leads to an integro-differential equation for the scaling function 
e(t). The important feature of e(t) is its behavior in the small and large t limits (entrance and exit from the plateau): 

e(t) ^ ^-^ e{t) t\ (AlO) 

where a is the same exponent as before, and h the dual one (cf. Eq. ([55]) ). In fact the small t behavior of e allows 
to fix the still undetermined scale n\{S): for a large, yet independent of i5, value of n, the study right at OLp lead to 
Qn " 0p ^ rT"" . For consistency we must have n~° ^ (5^/^(n/ni((5))~°, which implies n\[6) ^ (5^^/^°. 

The second scaling regime describes the decay of Qn from its plateau value down to zero, i.e. the distribution of 
the almost-frozen rearrangements whose size is diverging as a reaches ap. Suppose again the existence of another 
scale n{{5) for this to happen, and of the scaling function 

Q{t) ^ limQn^tmiS) ■ (AH) 

Plugging this ansatz in Eqs. (|A2IA3p one obtains another equation for Q{t), which implies in particular Q{t) — (pp^t^ 
at small t. Matching the small t behavior of Q{t) with the large t limit of the previous scaling function e{t), one finds 
that ni{5) ^ S^"^ , with i' = (l/2a) + (1/26), as announced in the main part of the text. 
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2. SAT 



The same steps, with some technical adaptations, can be followed in the case of SAT. Let us first define the 
integrated distributions and the generating functions for each value of the conditioning field: 



Qn{h) = ^ Qn'ih) , Qn{u) = ^ , R{h, x) = ^g„(/i)x" , R{u, x) = ^q„(w)a 



We rewrite Eqs. (|53l54l55p as 



Qn{u)Viu) 

R{h,x)V{h) 

Qn 



J HdVih,) S{u-f{h,, 



fe-i 



\ \ TT 1 ^ tanh hi 

,n'k-l)) [[ 5 Qn{h^) 



for n > 1 



1=1 

oo 



n dviui) n dPiu-) U - E < + E n ^(^r, ^) 



-tanh/i)Q„(/i) 



(A12) 

(A13) 

(A14) 
(A15) 



Recall that the functional order parameters (t>{h) = lim(5„(/i) = 1 — R{h,x = 1) and (f){u) are solutions of the 
equations (|59l60p : we denote 4'p{h) and (t>p{u) their values at the threshold ap for the appearance of a non-trivial 
solution, and (pp = lim(3„ = J d'P{h){l — tanh ft,)(/)p(/i). 

For our purposes it will be sufficient to work with the simplified versions of Eqs. (jA13IA14p obtained by integration 
over the fields: 

k-l 



dP{u)Qniu) = 

dV{h)R(h,x) = xcxp 



/ , N 1 ~ tanh h ^ , , , 
dVih) Qn{h) 



1 



lfe~l 



dP{u)R{u,x) 



(A16) 
(A17) 



2 

ak ak 

-- + T 

Consider now the behavior of these quantities right at the transition ap. The simplest hypothesis is to assume the 
existence of a single exponent a describing the decay of the integrated distributions Qn{h), Qniu), towards their limit 
(as n — > oo) (t>(h), 4>(u), independently of h^u. This hypothesis is customary in the formally analog mode coupling 
theory of liquids [3l|, where the role of the conditioning field is held by a wave vector. We thus make the ansatz 
Qn{h) ^ 4>{h) + A{h)n~'^ with A{h) a positive function. Expanding Eq. ()A16p . this leads to 



dV{u)Q^{u) ^ ( ^ 



fe-i 



k - 1 



k-2 



2k-i 

(fc-l)(fc-2) . 3 



dVih){l- tanh h)A{h) rT" 



'2a 



2k 

These algebraic decays at large n translate into singularities in the generating function around a; = 1, 
dV{h)R{h, 1 - s) ~ 1 - / dV{h)(t>p{h) -( f dV{h)A{h)) T{1 - a)s" , 



dr{u)R{u,l- s) - 1 - ( ^ 



fe-i 



k-2 



2fc-l 

(fc^l)(fc-2) 
2k 



dV{h){l - i'An\ih)A{h) r(l - a)s" 



dV{h){l-t&n\ih)A{h)] r(l-2a)s 



•2a 



(A18) 
(A19) 

(A20) 
(A21) 
(A22) 



Finally these expansions are inserted in (jA17|l : collecting the terms of order s", s°, yields the following three 
equations : 



dV{h)^p{h) 
dV{h)A{h) 

r(l - 2a) 



OLpk 

apk{k - 1) u_2 
2k ^ P 

2^{k - 2) 
apk{k - l)4>l~^ 



2k ^P 



dV{h){l ~ta,n\\h)A{h) 



= A = 



(A23) 
(A24) 
(A25) 
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FIG. 5: The scaling functions of the average m.s.r.d. for the random tree ensemble of 3-SAT. The almost superimposed curves 
correspond to a = 4.39, 4.392, 4.396. Left: intermediate scale t = nfop — a)^'^^°. see Eg. HA9p . Right: final scale t = n(ap — a)", 
cf. Eq. HA11|) . the dashed horizontal line indicates the order parameter (j)p. Numerical values of the exponents can be found in 
Table I] 



The first is a direct consequence of the equations (|59l60p on the order parameter, and can also be derived from 
(IA16IA17|) . setting x = 1 in the latter. 

The second is a functional analog of (jA5[) and deserves a short explanation. The order parameter (j){h) is defined 
as the solution of a fixed-point functional equation of the type (j^a = V[(j)a, a], where we keep implicit the functional 
character of but emphasize the dependence on the control parameter a. The relevant non-trivial solution of this 
equation which exists for a > ap disappears at api this is a bifurcation point in the vocabulary of discrete dynamical 
systems. A powerful tool in this context is the implicit function theorem: if for some value ao there is a solution (j)ag 
and if the differential of V with respect to (f> in (^qq, ao) has no eigenvector of eigenvalue 1, then the solution (/)q, can 
be continuously followed in a neighborhood of ao . At the bifurcation point ap the hypothesis of the theorem must be 
violated. Linearizing Eqs. (|59l60p . the reader will easily verify that an eigenvector of eigenvalue 1 of the differential 
satisfies Eq. (jA24l) . We can thus assume A{h) to be in this eigenspace for the second condition to be verified^. Note 
that for a real order parameter equation (jj = V{(f>,a), this condition is nothing but the equality of the derivatives 
1 = dcfjV at a transition, as used for instance in (|A5[) . 

The third equation fixes the exponent a and gives the value of the exponent A, as was claimed in the main part of 
the text (cf. Eqs. dM]) and 

The study of the intermediate and final scaling regimes can be done following the lines sketched above on the 
XORSAT example; for instance the behavior around the plateau is described, for all values of the cavity fields, by a 
single scaling function, generalizing Eq. (jA9p to 

Qn=tn,iS){h) ~ Mh) + S'^^A{h)e{t) . (A26) 

Provided A{h) is chosen in such a way that Eq. (|A24p is verified, €{t) obeys the same kind of integro-differential 
equation as the scaling function of the XORSAT problem, and in particular its behavior at small and large t is 
identical (see Eq. (jAlOp V We thus reach exactly the same conclusions on the behavior of ni{S) and ni{6). This is 
confirmed in Fig.[5l which shows, in the two regimes, a good collapse of numerically determined distributions Q„ for 
three values of a approaching ap. 



3. Asymptotics at large k,q 



We justify now the statements made in the main part of the text on the large fc, q behavior of the freezing thresholds. 
This analysis is simple in the case of XORSAT: from (|A4IA5P one obtains a closed equation on the order parameter 
at the transition, 

1 (l-</.p(fc))ln(l-^p(fc)) 



this explanation is of course heuristic; the functional character of the fixed poind equation makes the invocation of the implicit function 
theorem rather fuzzy. 
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which can be inverted to obtain an asymptotic expansion of (j>p(k). Reinserting it in Eq. (|A5p yields 

ap(fc) - i (^Infc + lnlnfc + l + O (^i^ji^^^^ . (A28) 

The formal correspondence with the COL problem (see Eq. leads immediately to the left hand side of ([75)1 . 

The distributions of fields P{h), P{u) for random SAT formulas can be shown from ([5^ to concentrate in the 
large k limit around, respectively, and 2^^. The equations (|53l54p on qn{h), qn{u) can thus be simplified at the 
leading order in k by retaining only these deterministic values of the conditioning fields. A simple transformation 
then shows that the distribution qn{h = 0) collapses onto the solution of the XORSAT equations (|31l32p . provided the 
connectivity a is divided by a factor of 2^ . This leads to the asymptotic behavior of the freezing threshold stated in 
Eq. l|77|) . and to the equivalence at large k of the exponents a, b, v in the SAT and XORSAT problems. A systematic 
expansion in powers of 2^^ of the deviations between the two models could be set up from this starting point. 

APPENDIX B: MINIMAL SIZE REARRANGEMENTS AT THE IRSB LEVEL 

1. General case 

We consider in this appendix the computation proposed in Sec. IVI CI namely the determination of the m.s.r.d. for 
a finite tree factor graph whose initial configuration is drawn according to the law /x^^^ (see Eq. (17^ ). To characterize 
it we introduce on each directed edge of the factor graph a distribution of cavity fields, denoted Pi^aiv) and Pa^i(i')- 
They obey the following set of equations, 

^a^.H = I n dP,^a{m^a)5{v~f{{l^j^a}))z{{l^,^a}r , (Bl) 

i'.^a(ry) = — ^— / n ^^"-K-) ^(W-Or , (B2) 

where the functions /, g and z are the ones defined in (|12ll3p for the corresponding edges. The boundary condition 
is given by Pi^a{ii) = ^cxt,i(?y) if i G -B, otherwise Pi^aii]) = S{r] — rf). The marginals of /i^^^ can be obtained from 
these distributions, for instance for a single variable one obtains 

^(i)(a.) = / dP.(77) r,{a,) , P.(r,) - } I [] dP„^.(;..^.) <5(?7 - 9{{^a^^})) z({^^a^J)" • (B3) 

J Z({Pa^^\) J ^^g^ 

We also have to introduce distributions of the size messages, 9,^'^°'°^''' (?/) and which corresponds to the 

weighted averages of the distributions in a single fi'-°\ From (|18ll9p one obtains 

P_,(^) g</^^^^')(i.) = J I n dP,^a{ll,~^a) 5{V - f{{r^,^a])) ^({^,^a})™ 

and 



with the boundary condition at the leaves 9,^*^° '^^ (??) — 5fi si^„y Finally the m.s.r.d. with respect to /i^^-* for a variable 
i reads 

9« = / dP(^) ^(a.) qf''^\^) [„-]^^ , (B6) 
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with 



n E ^"^(^^^^) • (B7) 



Note that this computation reduces to the one of Sec. IIII either when the distribution of cavity fields are concen- 
trated on a single value or when m = 1, defining in the latter case 

and similarly Va^i and q-^~*^''^^\ Yov a generic value of m one proceeds with the computation of the average m.s.r.d. 
for a random tree; the only modification with respects to Sec. IIII A 31 is a replacement of the distribution of external 
fields V{ri) by a distribution of distribution of fields, 'P{P). One has thus to define ql^.^\P), the average of the 
joint law Pi{'r])Qfi''^^\v) V and ft for the root of Tl, conditioned on the event P = Pi, and similarly q^^^P) for 

Tl. These quantities can be obtained by recursions on L through equations formally similar to (|22l23p . which could 
in principle be solved numerically using a population of population of elements (ry, ). We shall give the 

explicit form of these equations in the two particular cases of SAT and COL in the following two subsections. 



2. SAT 



For random SAT instances the stationarity conditions for the distribution of distribution of fields 'P{P), P{P) can 
be written in their distributional form as 



p^F{p,,...,Pk-i) , p^G{p+,...,p+,p,-,...,pr_) . 

The functionals F and G are defined by 

. fe-i 



(B9) 



Piu) = J H dP,{K) 5{u - fih,, . . . , hk-i)) z{hu . . . , hu^^r , (BIO) 

^('^^ = T77S±TT / n dPtiO \{dPr{ur)5\h~Y.ut+Y,ur\z{ut,...,ut^,u^,..., ul_ )™ , (Bll) 

S) i=i i=i \ j=i i=i 



where 



fc-i 



z{hi,...,hk-i)^2-Y[ 

i=l 

Z^U-^ I • ■ • I U, , U-^ , . . . , 



1 — tanh hj 



1 + tanh Yj 1 — tanh 
9 li 9 



nl — tanh uj' T-r 1 + tanh 
2 2 

i—1 i—1 i—1 i—1 

The conditional average of the joint law of cavity field and sizes obey the two following equations, 

/A. X p X 

n dV(P) S{P - F{{P}))^^^^ J n dh^ 6{u - f{h,, hk-i)) z{h,, hk-iY 



(B12) 
(B13) 



i=l 



(B14) 
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1+ / 1+ \ /+ /_ 

i=l i=l \ i=l i=l j 1=1 ni,. i=\ 

These equations conserve the conditions X]n9i^ri(^) = -fC*-) which follow from the definition of <iy^\(P)- Finally the 
average m.s.r.d. for the root of reads 

= y dV{P) Jdh{l- tanh/i) . (B16) 

As a consistency check one can reduce these equations to the ones developed in the main part of the text (cf. (|53l54l55p ') 
when 171 = 1, using the identity (jBSp . 

Let us come back on the IRSB equations (jB9IB10IBll[) . It is possible for the distributions P{u) in the support of 
V to acquire a peak on the hard field value u = +oo, of intensity denoted 4>{P)- This corresponds to a field forcing 
the variable node to satisfy the constraint node emitting the message. Similarly we call (/'(P) the intensity of the peak 
in h = — oo, signaling a clause that the emitting variable is forced to unsatisfy it. These intensities are found from 
(|B9IB10IB11|) to obey 



, fe-l , k~l 



1{P)V{P) = JU dV{P^) 5{P - n{P^}))Yl^^ n ^^P^) ' (B17) 

/• '+ ^ ^ ^ ^ ^ 1 



' l+tanh u ^ 



JdP-{u){^^j 

A randomly chosen variable will receive a forcing hard field in a randomly chosen pure state with probability 

^ = 2 J dV{P)<P{P) , (B19) 

where the factor 2 comes from the symmetry between positive and negative literals. 4> is also the order parameter of 
the freezing transition; the equations (|B14IB15p . in the L ^ oo limit, admit a solution where 0(P) (resp. (f'iP)) is the 
intensity of a Dirac peak on {h,n) = (— cx),cxd) (resp. {u,n) = (+oo,oo)). The fraction of diverging rearrangements 
in (jB16p is then seen to be equal to (p. 

In order to discuss the critical behavior of the m.s.r.d. it is convenient to derive an integrated version of 
Eqs. (|B14IB15|) . 

/ du QuAP){^ + tanhu)"' _ ( [ /" 1 - tanh/t^ ( u\\^~^ - 1 r^k-i 



J dP{u){l+ta.nhuy 



dPip y r l: . . . .J = ( / ^Pip) / dh — - — g,,„(F) ) = ^Qr' , (b20) 



dViP) 



Jdh Rh,x{P){l -tanh hy 



■ X exp 



ak 



ak 



dViP) 



J dP{h){l - tanh/i)™ 
where the former is valid for n > 1 and following our conventions we defined 



/ du P„,^(P)(1 +tanhu)" 
/dP(u)(l + tanhw)™ 



(B21) 



(B22) 



n'>n 



Let US call ap the threshold value for the appearance of a non-trivial solution to (jBlTIBlSp . and (j)p the corresponding 
order parameter. We want to determine the critical behavior of (?„ in the neighborhood of this threshold, expecting to 
recover the phenomenology obtained in the m = 1 case. For simplicity we shall consider only the first critical regime 
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at a = ttp^ , supposing an algebraic decay of Qn with an exponent a to its asymptotic value (j)^^ . More precisely we 
make the ansatz Qh.n{P) = 5{h + oo){(f>{P) + A{P)n~°') + o(n~°), with A{P) a positive function. The computation 
proceeds as in Sec. lA 21 one inserts this ansatz in (|B20p and expands to order The algebraic decays translate 

into singularities around .t = 1 in the generating functions of Eq. (jB2ip . matching the three leading orders one obtains 



dViP) 



dV{P) 



4>{P) 



A{P) 

'l-tanh/i\™ 



1 — exp 



2^{k - 2) 



^dP{h) 



r(l - 2a) 



exp 



'~2^* 



(B23) 



dr{P)A{P) , (B24) 



(B25) 



The first equality is a direct consequence of (|B17IB18p . the second is fulfilled by taking A{P) in the eigenspace of 
eigenvalue 1 of the differential of (|B17IB18p . while the third fixes the exponent a. The computation of the parameter 
A at the RSB level thus leads to the expression found in the RS approach (cf. Eq. ((66)) 1 apart from the replacement 
of the critical connectivity and order parameter with their corresponding RSB values. 



3. COL 



The random g-COL model is described at the IRSB level by a distribution 7^(P) over (invariant under the color 
permutations) distributions Pirf) of fields (laws on X — {l,...,g}). V is solution of the distributional equation 

P = F{Pi, . . . ,Pi), where I is a Poisson random variable of mean c and F is defined by 

P{ri)= ^ p. [dP,irj,)Siri-f{r^^,...,r,i))zirj,,...,r,ir , /({^,J)(^) = J|(l _ ^^(^)) . (B26) 

Z{Pi,...,Pi) J z{{r],})fj^ 



One can distinguish the hard fields which constrain a variable to take a definite color and define 



(B27) 



where P is a normalized distributions with no intensity on the hard fields d^. The order parameter 0(P) is found 
from (jB26p to obey: 



0(p)p(p) = J l['^np^) s{p - p(Pi, . . . , Pi))j^ 



1=0 i=l 



Z(Pl,...,P) 



9-1 
p=0 



i-ir n ( / '^^^ ^(1 - ^(^))" - -m)) • (B28) 

i=i 9 / 



The average m.s.r.d. on random trees where the initial configurations are drawn from the IRSB measure /x*^^^ reads 



dP{P) J dr, Y^via) g(^)(P) 



(B29) 



where ql^'^n"\P) is the conditional average of the joint law of size and fields messages. Note that all values of a 
contribute in the same way above, by the symmetry between colors. The equation governing qrj^n"\P) is 



g(^^+i)(P)P(P) =Yp' I[^nP^) S{P - P(Pi, . . . , Pi))yrjs ^ 

„ I I / ' 



(B30) 
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with 



^ a) 



(B31) 



The order parameter 4> = J dV{P)4>{P) is again the height of the plateau in the L — > oo hmit of the integrated 
average m.s.r.d. Qn- One can indeed check that q^}i{P) has a Dirac peak of intensity (f>{P)/q in (ry, n) = {d^, oo). 

The study of the critical behavior at the transition Cp^^ corresponding to the appearance of hard fields in the IRSB 
distributions is similar to the SAT case. We first write an integrated version of (jB30[) . 

1 



°° -c I 

e c 



II (q-lY ^ ^ 

1=0 ' cri,...,(Ti=2ni,...,n 



1 



mm 

T=2...,q 



n 



(g - 1) / dV{P) 



,1=1 

/d>7 (1-7^(1))"-! 77(a,)g^"^l(P) 
/dP(77) (l-??(a))™ 
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which is independent on the value of a. The ansatz Q^^}i{P) = S{r] — da)[4'{P) + A{P)n~°') + o(n~°) is then inserted 
in this equation. The first two orders in an asymptotic expansion at large n in powers of are satisfied thanks to 
(|B28p and by choosing A{P) in the eigenspace of eigenvalue 1 of its differential. The third order fixes the value of the 
exponent a through 



r(l-a)^^^^_^^l-0V(,-i) 



r(l-2a) ' 01/(9-1) q 

which corresponds for to = 1 to the expression foimd in Eq. (|65p . 



- / dV{P) 
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